Presentation is loading. Please wait.

Presentation is loading. Please wait.

999999-1 XYZ 6/18/2015 MIT Brain and Cognitive Sciences Convergence Analysis of Reinforcement Learning Agents Srinivas Turaga 9.912 30th March, 2004.

Similar presentations


Presentation on theme: "999999-1 XYZ 6/18/2015 MIT Brain and Cognitive Sciences Convergence Analysis of Reinforcement Learning Agents Srinivas Turaga 9.912 30th March, 2004."— Presentation transcript:

1 999999-1 XYZ 6/18/2015 MIT Brain and Cognitive Sciences Convergence Analysis of Reinforcement Learning Agents Srinivas Turaga 9.912 30th March, 2004

2 MIT Brain & Cog. Sci 999999-2 XYZ 6/18/2015 The Learning Algorithm Players use stochastic strategies. Players only observe their reward. Players attempt to estimate the value of choosing a particular action. The Assumptions The Algorithm Play action i with probability Pr(i) Observe reward r Update value function v

3 MIT Brain & Cog. Sci 999999-3 XYZ 6/18/2015 The Learning Algorithm The Algorithm Payoff matrix Player 2’s choice Player 1’s choice Value of action i Play action i with probability Pr(i) –Proportional to value of action i Observe reward r –Depends on other player’s choice j also Update value function v –2 simple schemes If action i chosen: If action i not chosen: Algorithm 1 Algorithm 2 forgetting no forgetting

4 MIT Brain & Cog. Sci 999999-4 XYZ 6/18/2015 Analysis Techniques Analysis of stochastic dynamics is hard! So approximate: –Consider average case (deterministic) –Consider continuous time (differential equation) Random! Discrete time! Deterministic! Discrete time! Deterministic! Continuous time!

5 MIT Brain & Cog. Sci 999999-5 XYZ 6/18/2015 Results - Matching Pennies Game Analysis shows a stable fixed point corresponding to matching behavior. Simulations of stochastic algorithm and deterministic dynamics converge as expected. Analysis shows a fixed point corresponding to the Nash equilibrium. Linear stability analysis shows marginal stability. Simulations of stochastic algorithm and deterministic dynamics diverge to corners.

6 MIT Brain & Cog. Sci 999999-6 XYZ 6/18/2015 Future Directions Validate approximation technique. Analyze properties of more general reinforcement learners. Consider situations with asymmetric learning rates. Study behavior of algorithms for arbitrary payoff matrices.


Download ppt "999999-1 XYZ 6/18/2015 MIT Brain and Cognitive Sciences Convergence Analysis of Reinforcement Learning Agents Srinivas Turaga 9.912 30th March, 2004."

Similar presentations


Ads by Google