Presentation is loading. Please wait.

Presentation is loading. Please wait.

Harm van Seijen Bram Bakker Leon Kester - TNO / UvA - UvA

Similar presentations


Presentation on theme: "Harm van Seijen Bram Bakker Leon Kester - TNO / UvA - UvA"— Presentation transcript:

1 Reinforcement Learning with Multiple, Qualitatively Different State Representations
Harm van Seijen Bram Bakker Leon Kester - TNO / UvA - UvA NIPS 2007 workshop 11/16/2018

2 The Reinforcement Learning Problem
action a Agent Environment state s, reward r Goal: maximize cumulative discounted reward Question: What is the best way to represent the environment? NIPS 2007 workshop 11/16/2018

3 NIPS 2007 workshop 11/16/2018

4 NIPS 2007 workshop 11/16/2018

5 Explanation of our Approach.
NIPS 2007 workshop 11/16/2018

6 agent 1 : state space S1 = {s11, s12, s13, … s1N1}
 Suppose 3 agents work in the same environment and have the same action-space, but different state space: agent 1 : state space S1 = {s11, s12, s13, … s1N1} state space size = N1 agent 2 : state space S2 = {s21, s22, s23, … s2N2} state space size = N2 agent 3 : state space S3 = {s31, s32, s33, … s3N3} state space size = N3 (mutual) action space A = {a1, a2} action space size = 2 NIPS 2007 workshop 11/16/2018

7 Extension action space
External Actions a_e1 : old a1 a_e2 : old a2 Switch actions: a_s1 : ‘switch to representation 1’ a_s2 : ‘switch to representation 2’ a_s3 : ‘switch to representation 3’ New Action space: a1 : a_e1 + a_s1 a2 : a_e1 + a_s2 a3 : a_e1 + a_s3 a4 : a_e2 + a_s1 a5 : a_e2 + a_s2 a6 : a_e2 + a_s3 NIPS 2007 workshop 11/16/2018

8 Extension state space agent 1 : state space S1 = {s11, s12, s13, … s1N1} state space size = N1 agent 2 : state space S2 = {s21, s22, s23, … s2N2} state space size = N2 agent 3 : state space S3 = {s31, s32, s33, … s3N3} state space size = N3 switch agent: state space S = {s11, s12, …, s1N1, s21, s22, …, s2N2,s31, s32, …, s3N3} state space size = N1+N2+N3 NIPS 2007 workshop 11/16/2018

9 Requirements and Advantages.
NIPS 2007 workshop 11/16/2018

10 Requirements for Convergence
Theoretical Requirement If the individual representations obey the Markov property than convergence to the optimal solution is guaranteed. Empirical Requirement Each representation should contain information that is useful for deciding on which external action to take and information that is useful for deciding when to switch. NIPS 2007 workshop 11/16/2018

11 State-Action Space Sizes Example
Representation States Actions State-Actions Rep 1 100 2 200 Rep 2 50 Rep 3 Switch (OR) 250 6 1.500 Union (AND) NIPS 2007 workshop 11/16/2018

12 Switching is advantageous if:
The state-space is very large AND The state-space is heterogeneous. NIPS 2007 workshop 11/16/2018

13 Results. NIPS 2007 workshop 11/16/2018

14 Traffic Scenario Situation: crossroad of 2 one-way roads
Task: traffic agent has to decide at each time step whether the vertical lane or the horizontal lane should get green light. Changing lights involves an orange time of 5 time steps. Reward: total cars waiting in front of the traffic light * -1 NIPS 2007 workshop 11/16/2018

15 Representation 1 NIPS 2007 workshop 11/16/2018

16 Representation 2 NIPS 2007 workshop 11/16/2018

17 Representations Compared
States Actions State-Actions Rep 1 64 2 128 Rep 2 24 48 Switch 88 4 352 Rep 1+ 256 512 NIPS 2007 workshop 11/16/2018

18 On-line performance for Traffic Scenario
NIPS 2007 workshop 11/16/2018

19 Demo. NIPS 2007 workshop 11/16/2018

20 Conclusions and Future Work.
NIPS 2007 workshop 11/16/2018

21 Conclusions We introduced an extension to the standard RL problem by allowing the decision agent to dynamically switch between a number of qualitatively different representations. This approach offers advantages in RL problems with large, heterogeneous state spaces. Experiments with a (simulated) traffic control problem showed good results: the agent allowed to switch had a higher end-performance, while the convergence rate was similar compared to a representation with similar state-action space size. NIPS 2007 workshop 11/16/2018

22 Future Work Use larger state spaces (~ few hundred states per representation) and more than 2 different representations. Explore the application domain of sensor management (for example switch between radar settings) Combine the switching approach with function approximation. Examine in more detail the convergence properties of the switch representation. Use representations that describe realistic sensor output. Explore new methods for switching. NIPS 2007 workshop 11/16/2018

23 Thank you. NIPS 2007 workshop 11/16/2018

24 Switching Algorithm versus POMDP
update estimate of a hidden variable and base decisions on a probability distribution over all possible values of this hidden variable. not possible to choose between different representations Switch Algorithm: hidden information is present, but not taken into account. The price for this is a more stochastic action outcome. when hidden information is very important for the decision making process the agent can decide to switch to a different representation that does take the information into account. NIPS 2007 workshop 11/16/2018


Download ppt "Harm van Seijen Bram Bakker Leon Kester - TNO / UvA - UvA"

Similar presentations


Ads by Google