Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Q Learning Lars Blackmore and Steve Block.

Similar presentations


Presentation on theme: "Distributed Q Learning Lars Blackmore and Steve Block."— Presentation transcript:

1 Distributed Q Learning Lars Blackmore and Steve Block

2 Contents – to be removed What is Q-learning? –MDP framework –Q-learning per se Distributed Q-learning  discuss how Sharing Q-values  why interesting? Simple averaging (no good) Expertness based distributed Q-learning Expertness w/ specialised agents (optional)

3 Markov Decision Processes Framework: MDP –States S –Actions A –Rewards R(s,a) –Transition Function T(s,a,s’) Goal: find optimal policy  *(s) G 100 0 0 0 G

4 Reinforcement Learning Want to find  * through experience –Reinforcement Learning –Intuitively similar to human/animal learning –Use some policy  for motion –Converge to the optimal policy  * An algorithm for reinforcement learning…

5 Q-Learning Define Q*(s,a): –“Total reward if agent is in state s, takes action a, then acts optimally forever” Optimal policy:  *(s)=argmax a Q*(s,a) Q(s,a) is an estimate of Q*(s,a) Q-learning motion policy:  (s)=argmax a Q(s,a) Update Q recursively: Optimality theorem: –“if each (s,a) pair is updated an infinite number of times, Q converges to Q* with probability 1”

6 Distributed Q-Learning Problem formulation Different approaches –Expanding state (share sensor information) –Sharing experiences –Share Q-values Experimental results? –If yes, have to explain setup… Sharing Q-values –Explain why most interesting

7 Sharing Q-values First approach: Simple Averaging Learning framework –Individual learning for t i trials –Each trial starts from a random state and ends when robot reaches goal –Next, all robots switch to cooperative learning Result: Simple Averaging is worse in general!

8 Why is Simple Averaging Worse? Slower learning rate: –Example: First robot to find the goal (at time t) Insensitive to environment changes: –First robot to find the change RobotQ(s,a) at tQ(s,a) at t+1Q*(s,a) 110025100 2025100 3025100 4025100

9 Expertness Idea: pay more attention to agents who are ‘experts’ –Expertness based cooperative Q-learning New Q-sharing equation: Agent i weights agent j’s Q value based on their relative expertness e i and e j

10 Expertness Measures Need to define expertness of agent j –Based on the reinforcement agent j has encountered Alternative definitions: –Simple Sum –Abs –Positive –Negative Different interpretations

11 Weighting Strategies How do we come up with weights based on the expertnesses? Alternative strategies: –‘Learn from all’: –‘Learn from experts’:

12 Experimental Setup Hunter-prey scenario Individual trial phase as before –Different number of trials for each agent Then cooperative phase

13 Results Cooperative vs. individual Different strategies Interpretation Conclusion – Expertness based methods are good if expertness significantly different.

14 Specialised Agents Agent i may have explored area A a lot but area B very little –What is agent i’s expertness? –Agent i is an expert in area A but not in area B Idea: –Agents can be specialised,i.e. experts in certain areas of the world –Pay more attention to Q-values from agents which are experts in that area

15 Specialised Agents Continued


Download ppt "Distributed Q Learning Lars Blackmore and Steve Block."

Similar presentations


Ads by Google