kyoka - Reinforcement Learning framework
What is Reinforcement Learning
Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. (wikipedia)
In reinforcement learning, the player to learn how to get good result in some task is called as agent.
Agent learns which action is good or bad at each situation through large number of simulation.
(The essential factor to characterize reinforcement learning is learning from trial-and-error.)
Why kyoka is creted
The steps to solve your learning problem (ex. playing Go) by reinforcement learning algorithms would be
- Define your learning problem in Reinforcement Learning format.
- Select learning algorithm(ex. QLearning) and implement it for your learning problem.
We have a lots of things to do before start learning.
This library is created to ease implementing these steps.
Sorry, I talked too much. Let's see the code with simple example !!
Hello Reinforcement Learning
We will find the shortest path to escape from below maze by QLearning
.
S: start, G: goal, X: wall
-------XG
--X----X-
S-X----X-
--X------
-----X---
---------
Step1. Define Maze Task
First we define our learning problem as reinforcement learning task.
kyoka provides kyoka.task.BaseTask
template class. This class has 5 abstracted methods you need to implement.
gegenerate_inital_state
: define start state of our problemis_terminal_state
: define when is the finish of our problemtransit_state
: define the rule of state transition in our problemgenerate_possible_actions
: define what action is possible in each statecalculate_reward
: define how good each state is
Here is the MazeTask
class which represents our learning problem.
from kyoka.task import BaseTask
class MazeTask(BaseTask):
ACTION_UP = 0
ACTION_DOWN = 1
ACTION_RIGHT = 2
ACTION_LEFT = 3
# We use current position of the agent in the maze as "state".
# So we return start position of the maze (row=2, col=0).
def generate_initial_state(self):
return (2, 0)
# The position of the goal is (row=0, column=8).
def is_terminal_state(self):
return (0, 8) == state
# We can always move towards 4 directions.
def generate_possible_actions(self, state):
return [self.ACTION_UP, self.ACTION_DOWN, self.ACTION_RIGHT, self.ACTION_LEFT]
# Agent can get reward +1 only when he reaches to the goal.
def calculate_reward(self, state):
return 1 if self.is_terminal_state(state) else 0
# Returns next state after moved toward direction of passed action.
# If destination is out of the maze or block cell, do not move.
def transit_state(self, state, action):
row, col = state
wall_position = [(1,2), (2,2), (3,2), (4,5), (0,7), (1,7), (2,7)]
height, width = 6, 9
if self.ACTION_UP == action:
row = max(0, row-1)
elif self.ACTION_DOWN == action:
row = min(height-1, row+1)
elif self.ACTION_RIGHT == action:
col= min(width-1, col+1)
elif self.ACTION_LEFT == action:
col = max(0, col-1)
if (row, col) not in wall_position:
return (row, col)
else:
return state # Stay current position if destination is not a path.
Step2. Setup QLearning for MazeTask
Next we implement value function of our MazeTask
for QLearning
.
value function is the function which receives state-action pair and estimates how good for the agent to take the action at the state. So value function would work like this
value_of_action = value_function.predict_value(state=(1, 5), action=ACTION_UP)
# value_of_action should be 1 because (1,6) is the goal of the maze.
The most important part of reinforcement learning is to learn correct value function of the task.
Each algorithm in this library has different base class of value function
(ex. QLearningTabularActionValueFunction
, DeepQLearningApproxActionValueFunction
).
Now we need to implement abstracted method of QLearningTabularActionValueFunction
.
Here is the MazeTabularValueFunction
class for QLearning
.
class MazeTabularValueFunction(QLearningTabularActionValueFunction):
# We use table(array) to store the value of state-action pair.
# Ex. the value of action=ACTION_RIGHT at state=(0,3) is stored in table[0][3][2].
def generate_initial_table(self):
maze_width, maze_height, action_num = 6, 9, 4
return [[[0 for a in range(action_num)] for j in range(width)] for i in range(height)]
# Define how to fetch value from the table which
# initialized by "generate_initial_table" method.
def fetch_value_from_table(self, table, state, action):
row, col = state
return table[row][col][action]
# Define how to update the value of table.
def insert_value_into_table(self, table, state, action, new_value):
row, col = state
table[row][col][action] = new_value
Final Step. Run QLearning
and see its result
Ok, we prepared everything. Next code starts the learning.
task = MazeTask()
policy = EpsilonGreedyPolicy(eps=0.1)
value_function = MazeTabularValueFunction()
algorithm = QLearning()
algorithm.setup(task, policy, value_function) # setup before calling "run_gpi"
algorithm.run_gpi(nb_iteration=100) # starts the learning
That's all !! Now value_function
stores how good each action is. Let's visualize what agent learned.
(We prepared helper method examples.maze.helper.visualize_policy
.)
>>> print visualize_policy(task, value_function)
-------XG
--X-v-vX^
S -> v-X-vvvX^
vvX>>>>>^
>>>>^-^^^
->^<^----
Great!! Agent found the shortest path to the goal. (14 step is the minimum step to the goal !!)
Installation
You can use pip like this.
pip install kyoka