Equlibrium Selection in Stochastic Games

Equlibrium Selection in Stochastic Games
By Marcin Kadluczka Dec 2nd 2002 CS 594 – Piotr Gmytrasiewicz CS 594

Agenda Definition of finite discounted stochastic games
Stationary equilibrium Linear tracing procedure Stochastic tracing procedures Examples of different equlibria depending on the type of stocastic tracing CS 594

Finite discounted stochastic games
Where N – is the finite set of players (N={1,2,…,n} )  - state space with finite number of states  CS 594

Rules of the game Player 1: Player 1: Player 2: Player 2: Transition
Time t Time t+1 Probability of transition Player 1: Player 1: Player 2: Player 2: . . Transition Player n: Player n: Current state Rewards CS 594

Other assumption Perfect recall Difference from normal-form game
At each stage each player remembers all past action chosen by all players and all past states occurred Difference from normal-form game The game does not exist of single play, but jumps according to the probability measure  to the next state and continues dynamically For rewards it count future states not only immediate payoffs CS 594

Pure & Mixed strategy Pure strategy Mixed strategy
If mixed strategy is played -> instantaneous expected payoff of player i is denoted by And transition probability by CS 594

Stationary strategy payoffs
History The set of possible histories up to stage k: Consists of all sequences Behavior strategy Stationary strategy Payoffs CS 594

Equilibrium General equilibrium Stationary equilibrium (Nash Eq.)
A strategy-tuple  is an equilibrium if and only if i is a best response to -i for all i Stationary equilibrium (Nash Eq.) Payoff for stationary equilibrium  CS 594

Comparison with other games
Comparison to normal-form games Comparison to MDPs More than one agent If strategy is stationary – they are the same Comparison to Bayesian Games No discount in Bayesian Types -> States We have beliefs inside prior CS 594

Linear tracing procedure
Corresponding normal-form game We fix the state : Prior probability distributions = prior Expectation of each player about other players strategy choices over the pure strategies Each player has the same assumption about others – Important assumption CS 594

Linear tracing procedure con’t
Family of one-parameter games Payoff function Decomposition of gamma 0 into maximalization problem for each player CS 594

Linear tracing procedure con’t
- set of equilibrium points in It can be collection of piece of one-dim curves, though in degenerate cases it may contain isolated points and/or more dim curves Feasible path  Linear tracing procedure Well-defined l.t.p  t 1 CS 594

Stochastic tracing procedure
Assumption: and prior p is given Stochastic game Total expected discounted payoffs Stochastic tracing procedure T(,p) Is this enough? CS 594

Alternative ways of extension payoff function for stochastic games
There are 4 ways of define player belief: Correlation within states – C(S) All opponents plays the same strategy Absence of correlation within states – I(S) Each opponent can play different strategy Correlation across time – C(T) Each player plays the same strategy accross the time Absence of correlation across time – I(T) During the time each player can change its strategy CS 594

Alternatives con’t Alternative 1: C(S),I(T) Alternative 2: C(S),C(T)

Alternatives con’t Alternative 3: I(S),I(T) Alternative 4: I(S),C(T)
CS 594

Example 1 – C(S) versus I(S)
Prior = Equilibria: Starting point: CS 594

Ex1: C(S) solution CS 594

Ex1: C(S) calculations (s1,s2,s3;1): (s1,s2,s3;2): (s1,s2’,s3;1):
Player 1 expect player 2 plays: (1/2(1-t)+t,1/2(1-t)) Player 1 expect player 3 plays: (2/3(1-t)+t,1/3(1-t)) Expected payoff: (1/2(1-t)+t)(2/3(1-t)+t)*2=1/3(1+t)(2+t) (s1,s2,s3;2): Player 2 expect player 1 plays: (1/6(1-t)+t,5/6(1-t)) Player 2 expect player 3 plays: (2/3(1-t)+t,1/3(1-t)) Expected payoff: (1/6(1-t)+t)(2/3(1-t)+t)*2=1/9(1+5t)(2+t) (s1,s2’,s3;1): Player 1 expect player 2 plays: (1/2(1-t)+t,5/6(1-t)) Expected payoff: (1/2(1-t))(2/3(1-t)+t)*2=1/9(1-t)(2+t) CS 594

Ex1: C(S) trajectory CS 594

Ex1: I(S) solution CS 594

Ex1: I(S) calculations (s1,s2,s3;1): (s1,s2,s3;2): (s1,s2’,s3;1):
Player 1 expect player 2&3 plays s2&s3: t Player 1 expect player 2&3 plays prior(s1&s3) : (1-t) Expected payoff: ((1-t)(1/2)(2/3)+t) *2=2/3(1-t)+2t (s1,s2,s3;2): Player 2 expect player 1&3 plays s1&s3: t Player 2 expect player 1&3 plays prior(s1&s3) : (1-t) Expected payoff: ((1-t)(1/6)(2/3)+t) *2=2/9(1-t)+2t (s1,s2’,s3;1): Player 1 expect player 2&3 plays s2’&s3: t (but payoff is 0) Expected payoff: ((1-t)(1/2)(2/3)) *2=2/3(1-t) CS 594

Ex1: I(S) trajectory CS 594

Example 2 – C(I) versus C(S)
Equilibria: Prior: Starting point: Payoffs Transition probalilities CS 594

Ex2: C(T) solution 0 Transition probalilities for player 2
CS 594

Ex2: C(T) trajectory CS 594

Ex2: I(T) trajectory CS 594

Summary Definition of stochastic games
Linear tracing procedure were presented Some extension were shown with examples C(S),I(T) is probably the best extension for calculation of strategy CS 594

Reference “Equlibrium Selection in Stochastic Games” by P. Jean-Jacques Herings and Ronald J.A.P. Peeters CS 594

Questions ? CS 594

Equlibrium Selection in Stochastic Games

Similar presentations

Presentation on theme: "Equlibrium Selection in Stochastic Games"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Equlibrium Selection in Stochastic Games

Similar presentations

Presentation on theme: "Equlibrium Selection in Stochastic Games"— Presentation transcript:

Similar presentations

About project

Feedback