IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com.

IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com

Group  Mentor: Dr. Michael L. Littman  Chair of the Computer Science Dept.  Specializing in AI and Reinforcement Learning  Grad Student Mentor: Michael Wunder  PhD Student studying with Dr. Littman

Game Theory  Study of interactions of rational utility-maximizing agents and prediction of their behavior An action profile is a Nash Equilibrium of a game if every player’s action is a best response to the other players actions. Normal Form Game Column acegbdfhacegbdfh AB Row Aa, bc, d Be, fg, h

Example Child BehaveMisbehave Parent Spoil1, 20, 3 Punish0, 12, 0 Spoiled Child GameAnalysis  Let Child be Reinforcement Learner  Parent’s intent to play towards Nash Equilibrium outcome:  (1/2)Spoil & (1/2)Punish  1.5  Child’s intent to play towards Nash Equilibrium outcome:  (2/3)Behave & (1/3) Misbehave  0.667

Reinforcement Learning  Def: Sub area of machine learning concerned with how an agent ought to take actions so as to maximize some notion of long term reward.  Michael Wunder, Michael Littman, and Monica Babes Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration. Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration

Q-Learning  Assign arbitrary Q-values to each strategy A and B.  Will refer to these values Q(A) as Q(B) respectively.  Q(action) =(1- α) Q(action) + αR   -greedy exploration: With a probability  the Q-learner will choose a random action.

Goals  Understand the behavior of the Q-learning algorithm in games with more actions, more players, or more states.  Try to formalize the notion of "value based equilibria".  Develop new algorithms that learn effectively in a wide variety of games.  Find a machine learner that elicits different behavior from different learning agents for possible use in diagnosing how people and monkeys learn.

Importance  The internet serves as a place where learning robots can serve as a proxy for human interaction Its use could be effective in auctions, making online purchases, tracking goods, or even playing online poker  Learning the state that results from interactions of AI can lead us to predict the long-term value of these interactions  A successful algorithm may prove conducive to the understanding of the brain’s ability to learn

IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com.

Similar presentations

Presentation on theme: "IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com.

Similar presentations

Presentation on theme: "IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com."— Presentation transcript:

Similar presentations

About project

Feedback