Rational Learning Leads to Nash Equilibrium Ehud Kalai and Ehud Lehrer Econometrica, Vol. 61 No. 5 (Sep 1993), 1019-1045 Presented by Vincent Mak

Slides:

Advertisements

Similar presentations

An Introduction to Game Theory Part V: Extensive Games with Perfect Information Bernhard Nebel.

Advertisements

Shortest Vector In A Lattice is NP-Hard to approximate

6.896: Topics in Algorithmic Game Theory Lecture 20 Yang Cai.

On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.

This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.

Mixed Strategies CMPT 882 Computational Game Theory Simon Fraser University Spring 2010 Instructor: Oliver Schulte.

Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization.

Evolution and Repeated Games D. Fudenberg (Harvard) E. Maskin (IAS, Princeton)

Congestion Games with Player- Specific Payoff Functions Igal Milchtaich, Department of Mathematics, The Hebrew University of Jerusalem, 1993 Presentation.

Bayesian Games Yasuhiro Kirihata University of Illinois at Chicago.

EC941 - Game Theory Lecture 7 Prof. Francesco Squintani

Week 21 Basic Set Theory A set is a collection of elements. Use capital letters, A, B, C to denotes sets and small letters a 1, a 2, … to denote the elements.

Equilibrium Concepts in Two Player Games Kevin Byrnes Department of Applied Mathematics & Statistics.

Dynamic Games of Complete Information.. Repeated games Best understood class of dynamic games Past play cannot influence feasible actions or payoff functions.

EC941 - Game Theory Prof. Francesco Squintani Lecture 8 1.

1 By Gil Kalai Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem, Israel presented by: Yair Cymbalista.

An Introduction to Markov Decision Processes Sarah Hickmott

Complexity Results about Nash Equilibria

Algorithmic and Economic Aspects of Networks Nicole Immorlica.

Harsanyi transformation Players have private information Each possibility is called a type. Nature chooses a type for each player. Probability distribution.

Job Market Signaling (Spence model)

AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.

UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.

Extensive Game with Imperfect Information Part I: Strategy and Nash equilibrium.

1 Engineering Computation Part 5. 2 Some Concepts Previous to Probability RANDOM EXPERIMENT A random experiment or trial can be thought of as any activity.

DANSS Colloquium By Prof. Danny Dolev Presented by Rica Gonen

Constraints in Repeated Games. Rational Learning Leads to Nash Equilibrium …so what is rational learning? Kalai & Lehrer, 1993.

UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.

Alternating-Offers Bargaining under One-Sided Uncertainty on Deadlines Francesco Di Giunta and Nicola Gatti Dipartimento di Elettronica e Informazione.

Cardinality of a Set “The number of elements in a set.” Let A be a set. a.If A =  (the empty set), then the cardinality of A is 0. b. If A has exactly.

MAKING COMPLEX DEClSlONS

Reading Osborne, Chapters 5, 6, 7.1., 7.2, 7.7 Learning outcomes

Chapter 1 Probability and Distributions Math 6203 Fall 2009 Instructor: Ayona Chatterjee.

Diophantine Approximation and Basis Reduction

Chapter 9 Games with Imperfect Information Bayesian Games.

Bayesian and non-Bayesian Learning in Games Ehud Lehrer Tel Aviv University, School of Mathematical Sciences Including joint works with: Ehud Kalai, Rann.

1 Game Theory Sequential bargaining and Repeated Games Univ. Prof.dr. M.C.W. Janssen University of Vienna Winter semester Week 46 (November 14-15)

All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer.

ECO290E: Game Theory Lecture 12 Static Games of Incomplete Information.

Game Theory (Microeconomic Theory (IV)) Instructor: Yongqin Wang School of Economics and CCES, Fudan University December,

Basic Concepts of Discrete Probability (Theory of Sets: Continuation) 1.

Chapter 3 – Set Theory .

Games with Imperfect Information Bayesian Games. Complete versus Incomplete Information So far we have assumed that players hold the correct belief about.

Moshe Tennenholtz, Aviv Zohar Learning Equilibria in Repeated Congestion Games.

Independence and Bernoulli Trials. Sharif University of Technology 2 Independence  A, B independent implies: are also independent. Proof for independence.

Week 11 What is Probability? Quantification of uncertainty. Mathematical model for things that occur randomly. Random – not haphazard, don’t know what.

Extensive Games with Imperfect Information

Topic 3 Games in Extensive Form 1. A. Perfect Information Games in Extensive Form. 1 RaiseFold Raise (0,0) (-1,1) Raise (1,-1) (-1,1)(2,-2) 2.

Introduction to Real Analysis Dr. Weihu Hong Clayton State University 8/27/2009.

6.853: Topics in Algorithmic Game Theory Fall 2011 Constantinos Daskalakis Lecture 22.

Game Theory (Microeconomic Theory (IV)) Instructor: Yongqin Wang School of Economics, Fudan University December, 2004.

Vincent Conitzer CPS Learning in games Vincent Conitzer

CS 285- Discrete Mathematics

EC941 - Game Theory Prof. Francesco Squintani Lecture 6 1.

5.1.Static Games of Incomplete Information

CS Lecture 26 Monochrome Despite Himself. Pigeonhole Principle: If we put n+1 pigeons into n holes, some hole must receive at least 2 pigeons.

Bargaining games Econ 414. General bargaining games A common application of repeated games is to examine situations of two or more parties bargaining.

Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)

On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA Twenty.

ECO290E: Game Theory Lecture 10 Examples of Dynamic Games.

Extending a displacement A displacement defined by a pair where l is the length of the displacement and  the angle between its direction and the x-axix.

Now that we have set of pure strategies for each player, we need to find the payoffs to put the game in strategic form. Random payoffs. The actual outcome.

PROBABILITY AND COMPUTING RANDOMIZED ALGORITHMS AND PROBABILISTIC ANALYSIS CHAPTER 1 IWAMA and ITO Lab. M1 Sakaidani Hikaru 1.

Vapnik–Chervonenkis Dimension

Equlibrium Selection in Stochastic Games

Games with Imperfect Information Bayesian Games

ECE700.07: Game Theory with Engineering Applications

Multiagent Systems Repeated Games © Manfred Huber 2018.

Quantum Foundations Lecture 2

Presentation transcript:

Rational Learning Leads to Nash Equilibrium Ehud Kalai and Ehud Lehrer Econometrica, Vol. 61 No. 5 (Sep 1993), Presented by Vincent Mak for Comp670O, Game Theoretic Applications in CS, Spring 2006, HKUST

Rational Learning2 Introduction How do players learn to reach Nash equilibrium in a repeated game, or do they? Experiments show that they sometimes do, but hope to find general theory of learning Hope to allow for wide range of learning processes and identify minimal conditions for convergence Fudenberg and Kreps (1988), Milgrom and Roberts (1991) etc. The present paper is another attack on the problem Companion paper: Kalai and Lehrer (1993), Econometrica, Vol. 61,

Rational Learning3 Model n players, infinitely repeated game The stage game (i.e. game at each round) is normal form and consists of: 1.n finite sets of actions, Σ 1, Σ 2, Σ 3 … Σ n with denoting the set of action combinations 2.n payoff functions u i : Σ  Perfect monitoring: players are fully informed about all realised past action combinations at each stage

Rational Learning4 Model Denote as H t the set of histories up to round t and thus of length t, t = 0, 1, 2, … i.e. H t = Σ t and Σ 0 = {Ø} Behaviour strategy of player i is f i : U t H t  Δ(Σ i ) i.e. a mapping from every possible finite history to a mixed stage game strategy of i Thus f i (Ø) is the i ’s first round mixed strategy Denote by z t = (z 1 t, z 2 t, … ) the realised action combination at round t, giving payoff u i (z t ) to player i at that round The infinite vector (z 1, z 2, …) is the realised play path of the game

Rational Learning5 Model Behaviour strategy vector f = (f 1, f 2, … ) induces a probability distribution μ f on the set of play paths, defined inductively for finite paths: μ f (Ø) = 1 for Ø denoting the null history μ f (ha) = μ f (h) x i f i (h)(a i ) = probability of observing history h followed by action vector a consisting of a i s, actions selected by i s

Rational Learning6 Model In the limit of Σ ∞, the finite play path h needs be replaced by cylinder set C(h) consisting of all elements in the infinite play path set with initial segment h; then f induces μ f (C(h)) Let F t denote the σ-algebra generated by the cylinder sets of histories of length t, and F the smallest σ-algebra containing all of F t s μ f defined on (Σ ∞, F ) is the unique extension of μ f from F t to F

Rational Learning7 Model Let λ i є (0,1) be the discount factor of player i ; let x i t = i ’s payoff at round t. If the behaviour strategy vector f is played, then the payoff of i in the repeated game is

Rational Learning8 Model For each player i, in addition to her own behaviour strategy f i, she has a belief f i = (f i 1, f i 2, … f i n ) of the joint behaviour strategies of all players, with f i i = f i (i.e. i knows her own strategy correctly) f i is an ε best response to f -i i (combination of behaviour strategies from all players other than i as believed by i ) if U i (f -i i, b i ) - U i (f -i i, f i ) ≤ ε for all behaviour strategies b i of player I, ε ≥ 0. ε = 0 corresponds to the usual notion of best response

Rational Learning9 Model Consider behaviour strategy vectors f and g inducing probability measures μ f and μ g μ f is absolutely continuous with respect to μ g, denoted as μ f 0  μ g (A) > 0 Call f << f i if μ f << μ f i Major assumption: If μ f is the probability for realised play paths and μ f i is the probability for play paths as believed by player i, μ << μ f i

Rational Learning10 Kuhn’s Theorem Player i may hold probabilistic beliefs of what behaviour strategies j ≠ i may use (i assumes other players choose strategies independently) Suppose i believes that j plays behaviour strategy f j,r with probability p r (r is an index for elements of the support of j ’s possible behaviour strategies according to i ’s belief) Kuhn’s equivalent behaviour strategy f j i is: where the conditional probability is calculated according to i ’s prior beliefs, i.e. p r, for all the r s in the support – a Bayesian updating process, important throughout the paper

Rational Learning11 Definitions Definition 1: Let ε > 0 and let μ and μ be two probability measures defined on the same space. μ is ε-close to μ if there exists measurable set Q such that: 1. μ(Q) and μ(Q) are greater than 1- ε 2. For every measurable subset A of Q, (1-ε) μ(A) ≤ μ(A) ≤ (1+ε) μ(A) -- A stronger notion of closeness than |μ(A) - μ(A)| ≤ ε

Rational Learning12 Definitions Definition 2: Let ε ≥ 0. The behaviour strategy vector f plays ε-like g if μ f is ε-close to μ g Definition 3: Let f be a behaviour strategy vector, t denote a time period and h a history of length t. Denote by hh’ the concatenation of h with h’, a history of length r (say) to form a history of length t + r. The induced strategy f h is defined as f h (h’ ) = f (hh’ )

Rational Learning13 Main Results: Theorem 1 Theorem 1: Let f and f i denote the real behaviour strategy vector and that believed by i respectively. Assume f 0 and almost every play path z according to μ f, there is a time T (= T(z, ε)) such that for all t ≥ T, f z(t) plays ε-like f z(t) i Note the induced μ for f z(t) etc. are obtained by Bayesian updating “Almost every” means convergence of belief and reality only happens for the realisable play paths according to f

Rational Learning14 Subjective equilibrium Definition 4: A behaviour strategy vector g is a subjective ε-equilibrium if there is a matrix of behaviour strategies (g j i ) 1≤i,j≤n with g j i = g j such that i) g j is a best response to g -i i for all i = 1,2 …n ii) g plays ε-like g j for all i = 1,2 …n ε = 0  subjective equilibrium; but μ g is not necessarily identical to μ g i off the realisable play paths and the equilibrium is not necessarily identical to Nash equilibrium (e.g. one-person multi-arm bandit game)

Rational Learning15 Main Results: Corollary 1 Corollary 1: Let f and {f i } denote the real behaviour strategy vector and that believed by i respectively, for i = 1,2... n. Suppose that, for every i : i) f j i = f j is a best response to f -i i ii) f << f i Then for every ε > 0 and almost every play path z according to μ f, there is a time T (= T(z, ε)) such that for all t ≥ T, {f z(t) i, i = 1,2…n} is a subjective ε-equilibrium This corollary is a direct result of Theorem 1

Rational Learning16 Main Results: Proposition 1 Proposition 1: For every ε > 0 there is η > 0 such that if g is a subjective η-equilibrium then there exists f such that: i) g plays ε-like f ii) f is an ε-Nash equilibrium Proved in the companion paper, Kalai and Lehrer (1993)

Rational Learning17 Main Results: Theorem 2 Theorem 2: Let f and {f i } denote the real behaviour strategy vector and that believed by i respectively, for i = 1,2... n. Suppose that, for every i : i) f j i = f j is a best response to f -i i ii) f << f i Then for every ε > 0 and almost every play path z according to μ f, there is a time T (= T(z, ε)) such that for all t ≥ T, there exists an ε-Nash equilibrium f of the repeated game satisfying f z(t) plays ε-like f This theorem is a direct result of Corollary 1 and Proposition 1

Rational Learning18 Alternative to Theorem 2 Alternative, weaker definition of closeness: for ε > 0 and positive integer l, μ is (ε,l)-close to μ if for every history h of length l or less, |μ(h)-μ(h)| ≤ ε f plays (ε,l)-close to g if μ f is (ε,l)-close to μ g “Playing ε the same up to a horizon of l periods” With results from Kalai and Lehrer (1993), can replace last part of Theorem 2 by: … Then for every ε > 0 and a positive integer l, there is a time T (= T(z, ε, l)) such that for all t ≥ T, there exists a Nash equilibrium f of the repeated game satisfying f z(t) plays (ε,l)-like f

Rational Learning19 Theorem 3 Define information partition series { P t } t as increasing sequence (i.e. P t+1 refines P t ) of finite or countable partitions of a state space Ω (with elements ω ); agent knows the partition element P t (ω) є P t she is in at time t but not the exact state ω Assume Ω has σ-algebra F that is the smallest that contains all elements of { P t } t ; let F t be the σ- algebra generated by P t Theorem 3: Let μ 0 there is a random time t(ε) such that for all r ≥ r(ε), μ (.|P r (ω)) is ε-close to μ (.|P r (ω)) Essentially the same as Theorem 1 in context

Rational Learning20 Proposition 2 Proposition 2: Let μ 0 there is a random time t (ε) such that for all s ≥ t ≥ t (ε), Proved by applying Radon-Nikodym theorem and Levy’s theorem This proposition satisfies part of the definition of closeness that is needed for Theorem 3

Rational Learning21 Lemma 1 Let { W t } be an increasing sequence of events satisfying μ(W t )↑ 1. For every ε > 0 there is a random time t (ε) such that any random t ≥ t (ε) satisfies μ { ω; μ(W t | P t (ω)) ≥ 1- ε} = 1 With W t = {ω ; | E(φ| F s )(ω)/ E(φ| F t )(ω)-1|< ε for all s ≥ t }, Lemma 1 together with Proposition 2 imply Theorem 3