Equlibrium Selection in Stochastic Games

Slides:



Advertisements
Similar presentations
Vincent Conitzer CPS Repeated games Vincent Conitzer
Advertisements

M9302 Mathematical Models in Economics Instructor: Georgi Burlakov 3.1.Dynamic Games of Complete but Imperfect Information Lecture
Markov Decision Process
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
Chapter Twenty-Eight Game Theory. u Game theory models strategic behavior by agents who understand that their actions affect the actions of other agents.
Bayesian Games Yasuhiro Kirihata University of Illinois at Chicago.
EC941 - Game Theory Lecture 7 Prof. Francesco Squintani
Economics 202: Intermediate Microeconomic Theory 1.HW #6 on website. Due Tuesday. 2.Second test covers up through today’s material, and will be “pseudo-cumulative”
Game-theoretic analysis tools Necessary for building nonmanipulable automated negotiation systems.
Dynamic Games of Complete Information.. Repeated games Best understood class of dynamic games Past play cannot influence feasible actions or payoff functions.
Eponine Lupo.  Game Theory is a mathematical theory that deals with models of conflict and cooperation.  It is a precise and logical description of.
Markov Decision Processes
Center for the Study of Rationality Hebrew University of Jerusalem 1/39 n-Player Stochastic Games with Additive Transitions.
Rational Learning Leads to Nash Equilibrium Ehud Kalai and Ehud Lehrer Econometrica, Vol. 61 No. 5 (Sep 1993), Presented by Vincent Mak
Complexity Results about Nash Equilibria
Harsanyi transformation Players have private information Each possibility is called a type. Nature chooses a type for each player. Probability distribution.
Chapter Twenty-Eight Game Theory. u Game theory models strategic behavior by agents who understand that their actions affect the actions of other agents.
Advanced Microeconomics Instructors: Wojtek Dorabialski & Olga Kiuila Lectures: Mon. & Wed. 9:45 – 11:20 room 201 Office hours: Mon. & Wed. 9:15 – 9:45.
Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces.
XYZ 6/18/2015 MIT Brain and Cognitive Sciences Convergence Analysis of Reinforcement Learning Agents Srinivas Turaga th March, 2004.
Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.
APEC 8205: Applied Game Theory Fall 2007
Chapter 6 Extensive Games, perfect info
Nash Equilibria in Competitive Societies Eyal Rozenberg Roy Fox.
Two-Stage Games APEC 8205: Applied Game Theory Fall 2007.
1 On the Agenda(s) of Research on Multi-Agent Learning by Yoav Shoham and Rob Powers and Trond Grenager Learning against opponents with bounded memory.
MAKING COMPLEX DEClSlONS
Chapter 9 Games with Imperfect Information Bayesian Games.
© 2009 Institute of Information Management National Chiao Tung University Lecture Note II-3 Static Games of Incomplete Information Static Bayesian Game.
Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie Mellon University.
Derivative Action Learning in Games Review of: J. Shamma and G. Arslan, “Dynamic Fictitious Play, Dynamic Gradient Play, and Distributed Convergence to.
ECO290E: Game Theory Lecture 12 Static Games of Incomplete Information.
1 Economics & Evolution Number 3. 2 The replicator dynamics (in general)
Games with Imperfect Information Bayesian Games. Complete versus Incomplete Information So far we have assumed that players hold the correct belief about.
Game-theoretic analysis tools Tuomas Sandholm Professor Computer Science Department Carnegie Mellon University.
Game theory & Linear Programming Steve Gu Mar 28, 2008.
5.1.Static Games of Incomplete Information
1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.
Extensive Form (Dynamic) Games With Perfect Information (Theory)
Game Theory Georg Groh, WS 08/09 Verteiltes Problemlösen.
M9302 Mathematical Models in Economics Instructor: Georgi Burlakov 2.1.Dynamic Games of Complete and Perfect Information Lecture
Game theory basics A Game describes situations of strategic interaction, where the payoff for one agent depends on its own actions as well as on the actions.
Keep the Adversary Guessing: Agent Security by Policy Randomization
Game Theory and Cooperation
Simultaneous-Move Games: Mixed Strategies
Replicator Dynamics.
Making complex decisions
Dynamic Games of Complete Information
Information Design: A unified Perspective
Markov Decision Processes
Vincent Conitzer CPS Repeated games Vincent Conitzer
Information Design: A unified Perspective
Econ 805 Advanced Micro Theory 1
Game Theory in Wireless and Communication Networks: Theory, Models, and Applications Lecture 2 Bayesian Games Zhu Han, Dusit Niyato, Walid Saad, Tamer.
Multiagent Systems Game Theory © Manfred Huber 2018.
Game Theory in Wireless and Communication Networks: Theory, Models, and Applications Lecture 10 Stochastic Game Zhu Han, Dusit Niyato, Walid Saad, and.
Economics & Evolution.
Games with Imperfect Information Bayesian Games
Computing Nash Equilibrium
Multiagent Systems Repeated Games © Manfred Huber 2018.
Vincent Conitzer Repeated games Vincent Conitzer
Molly W. Dahl Georgetown University Econ 101 – Spring 2009
Lecture Game Theory.
M9302 Mathematical Models in Economics
Markov Decision Processes
CSRG Presented by Souvik Das 11/02/05
Markov Decision Processes
Vincent Conitzer CPS Repeated games Vincent Conitzer
Presentation transcript:

Equlibrium Selection in Stochastic Games By Marcin Kadluczka Dec 2nd 2002 CS 594 – Piotr Gmytrasiewicz CS 594

Agenda Definition of finite discounted stochastic games Stationary equilibrium Linear tracing procedure Stochastic tracing procedures Examples of different equlibria depending on the type of stocastic tracing CS 594

Finite discounted stochastic games Where N – is the finite set of players (N={1,2,…,n} )  - state space with finite number of states  CS 594

Rules of the game Player 1: Player 1: Player 2: Player 2: Transition Time t Time t+1 Probability of transition Player 1: Player 1: Player 2: Player 2: . . Transition Player n: Player n: Current state Rewards CS 594

Other assumption Perfect recall Difference from normal-form game At each stage each player remembers all past action chosen by all players and all past states occurred Difference from normal-form game The game does not exist of single play, but jumps according to the probability measure  to the next state and continues dynamically For rewards it count future states not only immediate payoffs CS 594

Pure & Mixed strategy Pure strategy Mixed strategy If mixed strategy is played -> instantaneous expected payoff of player i is denoted by And transition probability by CS 594

Stationary strategy payoffs History The set of possible histories up to stage k: Consists of all sequences Behavior strategy Stationary strategy Payoffs CS 594

Equilibrium General equilibrium Stationary equilibrium (Nash Eq.) A strategy-tuple  is an equilibrium if and only if i is a best response to -i for all i Stationary equilibrium (Nash Eq.) Payoff for stationary equilibrium  CS 594

Comparison with other games Comparison to normal-form games Comparison to MDPs More than one agent If strategy is stationary – they are the same Comparison to Bayesian Games No discount in Bayesian Types -> States We have beliefs inside prior CS 594

Linear tracing procedure Corresponding normal-form game We fix the state : Prior probability distributions = prior Expectation of each player about other players strategy choices over the pure strategies Each player has the same assumption about others – Important assumption CS 594

Linear tracing procedure con’t Family of one-parameter games Payoff function Decomposition of gamma 0 into maximalization problem for each player CS 594

Linear tracing procedure con’t - set of equilibrium points in It can be collection of piece of one-dim curves, though in degenerate cases it may contain isolated points and/or more dim curves Feasible path  Linear tracing procedure Well-defined l.t.p  t 1 CS 594

Stochastic tracing procedure Assumption: and prior p is given Stochastic game Total expected discounted payoffs Stochastic tracing procedure T(,p) Is this enough? CS 594

Alternative ways of extension payoff function for stochastic games There are 4 ways of define player belief: Correlation within states – C(S) All opponents plays the same strategy Absence of correlation within states – I(S) Each opponent can play different strategy Correlation across time – C(T) Each player plays the same strategy accross the time Absence of correlation across time – I(T) During the time each player can change its strategy CS 594

Alternatives con’t Alternative 1: C(S),I(T) Alternative 2: C(S),C(T)

Alternatives con’t Alternative 3: I(S),I(T) Alternative 4: I(S),C(T) CS 594

Example 1 – C(S) versus I(S) Prior = Equilibria: Starting point: CS 594

Ex1: C(S) solution CS 594

Ex1: C(S) calculations (s1,s2,s3;1): (s1,s2,s3;2): (s1,s2’,s3;1): Player 1 expect player 2 plays: (1/2(1-t)+t,1/2(1-t)) Player 1 expect player 3 plays: (2/3(1-t)+t,1/3(1-t)) Expected payoff: (1/2(1-t)+t)(2/3(1-t)+t)*2=1/3(1+t)(2+t) (s1,s2,s3;2): Player 2 expect player 1 plays: (1/6(1-t)+t,5/6(1-t)) Player 2 expect player 3 plays: (2/3(1-t)+t,1/3(1-t)) Expected payoff: (1/6(1-t)+t)(2/3(1-t)+t)*2=1/9(1+5t)(2+t) (s1,s2’,s3;1): Player 1 expect player 2 plays: (1/2(1-t)+t,5/6(1-t)) Expected payoff: (1/2(1-t))(2/3(1-t)+t)*2=1/9(1-t)(2+t) CS 594

Ex1: C(S) trajectory CS 594

Ex1: I(S) solution CS 594

Ex1: I(S) calculations (s1,s2,s3;1): (s1,s2,s3;2): (s1,s2’,s3;1): Player 1 expect player 2&3 plays s2&s3: t Player 1 expect player 2&3 plays prior(s1&s3) : (1-t) Expected payoff: ((1-t)(1/2)(2/3)+t) *2=2/3(1-t)+2t (s1,s2,s3;2): Player 2 expect player 1&3 plays s1&s3: t Player 2 expect player 1&3 plays prior(s1&s3) : (1-t) Expected payoff: ((1-t)(1/6)(2/3)+t) *2=2/9(1-t)+2t (s1,s2’,s3;1): Player 1 expect player 2&3 plays s2’&s3: t (but payoff is 0) Expected payoff: ((1-t)(1/2)(2/3)) *2=2/3(1-t) CS 594

Ex1: I(S) trajectory CS 594

Example 2 – C(I) versus C(S) Equilibria: Prior: Starting point: Payoffs Transition probalilities CS 594

Ex2: C(T) solution 0 Transition probalilities for player 2 CS 594

Ex2: C(T) trajectory CS 594

Ex2: I(T) trajectory CS 594

Summary Definition of stochastic games Linear tracing procedure were presented Some extension were shown with examples C(S),I(T) is probably the best extension for calculation of strategy CS 594

Reference “Equlibrium Selection in Stochastic Games” by P. Jean-Jacques Herings and Ronald J.A.P. Peeters CS 594

Questions ? CS 594