Download presentation
Presentation is loading. Please wait.
Published byDora McBride Modified over 10 years ago
1
IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com
2
Group Mentor: Dr. Michael L. Littman Chair of the Computer Science Dept. Specializing in AI and Reinforcement Learning Grad Student Mentor: Michael Wunder PhD Student studying with Dr. Littman
3
Game Theory Study of interactions of rational utility-maximizing agents and prediction of their behavior An action profile is a Nash Equilibrium of a game if every player’s action is a best response to the other players actions. Normal Form Game Column acegbdfhacegbdfh AB Row Aa, bc, d Be, fg, h
4
Example Child BehaveMisbehave Parent Spoil1, 20, 3 Punish0, 12, 0 Spoiled Child GameAnalysis Let Child be Reinforcement Learner Parent’s intent to play towards Nash Equilibrium outcome: (1/2)Spoil & (1/2)Punish 1.5 Child’s intent to play towards Nash Equilibrium outcome: (2/3)Behave & (1/3) Misbehave 0.667
5
Reinforcement Learning Def: Sub area of machine learning concerned with how an agent ought to take actions so as to maximize some notion of long term reward. Michael Wunder, Michael Littman, and Monica Babes Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration. Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration
6
Q-Learning Assign arbitrary Q-values to each strategy A and B. Will refer to these values Q(A) as Q(B) respectively. Q(action) =(1- α) Q(action) + αR -greedy exploration: With a probability the Q-learner will choose a random action.
7
Goals Understand the behavior of the Q-learning algorithm in games with more actions, more players, or more states. Try to formalize the notion of "value based equilibria". Develop new algorithms that learn effectively in a wide variety of games. Find a machine learner that elicits different behavior from different learning agents for possible use in diagnosing how people and monkeys learn.
8
Importance The internet serves as a place where learning robots can serve as a proxy for human interaction Its use could be effective in auctions, making online purchases, tracking goods, or even playing online poker Learning the state that results from interactions of AI can lead us to predict the long-term value of these interactions A successful algorithm may prove conducive to the understanding of the brain’s ability to learn
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.