 When playing two person finitely repeated games, do people behave like they are adapting a policy directly(Gradient Accent) or do they behave like they.

 When playing two person finitely repeated games, do people behave like they are adapting a policy directly(Gradient Accent) or do they behave like they are estimating values and choosing actions based on the estimates(Q-learning)?  Some research in Neuroscience suggests that Monkeys think like a Q-Learner!?

 Compared different learners › Q-Learner with  -Greedy Exploration › Gradient Accent Learner with decreasing step size › Human Learner 1,20,30,12,0 Child Parent

 Assign arbitrary Q-values to each strategy A and B. › Will refer to these values Q(A) as Q(B) respectively. ›  -greedy exploration:  With a probability  the Q-learner will choose a random action.

Reward Function Gradient of Reward Decreasing Step Size Update Function

 Quickly forgiving Tit-for-Tat player › Always play the last play of the opponent. › If the last action pair was BB then play strategy A on the next play so as to quickly forgive the opponent for making a “poor” choice.

Previous selection choice is known Unknown previous selection choice

 Intelligence order › Human>Q-Learner>Gradient Accent Learner=Nash  Future Research › Q-Learner with History  Assumed that historical Q-Learner’s actions will better resemble human behavior.  How do people attempt to make “good” action choices? › Create GUI so other people can play against Q-Learner. › Payoff alteration?

 [1] Babes, M., Munoz de Cote, E., and Littman, M. Social reward shaping in the prisoner's dilemma. In 7th International Joint Conference on Autonomous Agents and Multiagent Systems, pages 1389-1392, 2008.  [2] Littman, M. Markov games as a framework for multi-agent reinforcement learning. Proceedings of eleventh international conference on machine learning, pp.157- 163 San Francisco. CA. Morgan Koufmann. 1994.  [3] Singh, S., Kearns, M., and Mansour, Y. Nash convergence of gradient dynamics in general-sum games, Proceedings of the Sixteenth Conference on Uncertainty in Artifcial Intelligence, Morgan Kaufman, 2000.  [4] Straffin, P. Game Theory and Strategy. Washington DC. The Mathematical Association of America, 2006.  [5] Wunder, M., Littman, M., and Babes, M. Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration. Proceedings of twenty-seventh International Conference on Machine Learning, 2010

 When playing two person finitely repeated games, do people behave like they are adapting a policy directly(Gradient Accent) or do they behave like they.

Similar presentations

Presentation on theme: " When playing two person finitely repeated games, do people behave like they are adapting a policy directly(Gradient Accent) or do they behave like they."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

 When playing two person finitely repeated games, do people behave like they are adapting a policy directly(Gradient Accent) or do they behave like they.

Similar presentations

Presentation on theme: " When playing two person finitely repeated games, do people behave like they are adapting a policy directly(Gradient Accent) or do they behave like they."— Presentation transcript:

Similar presentations

About project

Feedback