Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005.

Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005

Outline Reinforcement Learning RL Agent RL Agent Policy Policy Hierarchical Reinforcement Learning The Need The Need Sub-Goal Detection Sub-Goal Detection State Clusters State Clusters Border States Border States Continuous State and/or Action Spaces Continuous State and/or Action Spaces Options Options Macro Q-Learning with Parallel Option Discovery Macro Q-Learning with Parallel Option Discovery Experimental Results

Reinforcement Learning Agent observes the state, and takes the action according to the policy Policy is a function from the state space onto the action space Policy can be deterministic or non- deterministic State and action spaces can be discrete, continuous or hybrid

RL Agent No model of the environment Agent observes state s, takes action a and goes into state s’ observing reward r Agent tries to maximize total expected reward (return) Finite state machine model SS’ a, r

Policy In a flat RL model, policy is a map from each state to a primitive action In the optimal policy, the action taken by the agent return highest return at each each step Can be kept in tabular format for small state and action spaces Function approximators can be used for large state or action spaces (or continuous ones)

The Need For Hierarchical RL Increase the performance Applying RL to the problems with large action and/or state space become feasible Detection of sub-goals can help the agent to have the abstract actions defined over the primitive actions Sub-goals and abstract actions can be used in different tasks on the same domain. The knowledge is transferred between tasks The policy of the agent can be translated into a natural language

Sub-goal Detection A sub-goal can be a single state, a subset of the state space, or a constraint in the state space Reaching a sub-goal should help the agent reaching the main goal (to get the highest return) Sub-goals must be discovered by the agent autonomously

State Clusters The states in a cluster are strongly connected to each other The number of state transitions among clusters are small The states at two ends of a state transition between two different clusters are sub-goal candidates Clusters can be hierarchical Different clusters can be in the same cluster at a higher level Different clusters can be in the same cluster at a higher level

Border States Some actions cannot be applied in some states. These states are defined as border states Border states are assumed to have a transition sequence. We can travel through the border states by taking some actions Each end in this transition sequence is a candidate sub-goal assuming the agent sufficiently explored the environment

Border State Detection For discrete action and state space F(s): set of states which can be reached from state s in one time unit F(s): set of states which can be reached from state s in one time unit G(s): if an action in G(s) is applied at state s, no state transition occurs G(s): if an action in G(s) is applied at state s, no state transition occurs H(s): if an action in H(s) is applied at state s, the agent moves to a different state H(s): if an action in H(s) is applied at state s, the agent moves to a different state

Border State Detection Detect the longest state sequence s 0,s 1,s 2,…,s k-1,s k which satisfies the following constraints s i  F(s i+1 ) or s i+1  F(s i ) for 0  i<k s i  F(s i+1 ) or s i+1  F(s i ) for 0  i<k G(s i )  G(s i+1 )   for 0<i<k-1 G(s i )  G(s i+1 )   for 0<i<k-1 H(s 0 )  G(s 1 )   H(s 0 )  G(s 1 )   H(s k )  G(s k-1 )   H(s k )  G(s k-1 )   s 0 and s k are candidate sub-goals

Border States on Continuous State and Action Spaces Environment is assumed to be bounded State and action vectors can include both continuous and discrete dimensions The derivative of state vector with respect to the action vector can be used The border state regions must have small derivatives for some action vectors The large change in these derivatives is the indication of border state regions

Options An option is a policy It can be local (defined on a subset of state space) or can be global The option policy can use primitive actions or other options It is hierarchical Used to reach sub-goals

Macro Q-Learning with Parallel Option Discovery Agent starts with no sub-goal and option It detects the sub-goals and learns the option policies and the main policy simultaneously Options are formed and removed from the model according the sub-goal detection algorithm When a possible sub-goal is detected, a new option is added to the model to have the policy to reach this sub- goal All options policies are updated in parallel The agent generates an internal reward if a sub-goal is reached

Macro Q-Learning with Parallel Option Discovery An Option is defined by the following: O = (  o,  o, I o, Q o, r o ) where Q o is Q values for the option and r o is the internal reward signal associated with the option Intra-option learning method is used

Experiments Flat RL Hierarchical RL

Options in HRL

Questions and Suggestions!!!

Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005.

Similar presentations

Presentation on theme: "Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005.

Similar presentations

Presentation on theme: "Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005."— Presentation transcript:

Similar presentations

About project

Feedback