Download presentation

Presentation is loading. Please wait.

Published byTayler Frome Modified over 3 years ago

1
Belief Learning in an Unstable Infinite Game Paul J. Healy CMU

2
Belief Learning in an Unstable Infinite Game Issue #3 Issue #1 Issue #2

3
Issue #1: Infinite Games Typical Learning Model: –Finite set of strategies –Strategies get weight based on ‘fitness’ –Bells & Whistles: experimentation, spillovers… Many important games have infinite strategies –Duopoly, PG, bargaining, auctions, war of attrition… Quality of fit sensitive to grid size? Models don’t use strategy space structure

4
Previous Work Grid size on fit quality: –Arifovic & Ledyard Groves-Ledyard mechanisms Convergence failure of RL with |S| = 51 Strategy space structure: –Roth & Erev AER ’99 Quality-of-fit/error measures –What’s the right metric space? Closeness in probs. or closeness in strategies?

5
Issue #2: Unstable Game Usually predicting convergence rates –Example: p–beauty contests Instability: –Toughest test for learning models –Most statistical power

6
Previous Work Chen & Tang ‘98 –Walker mechanism & unstable Groves-Ledyard –Reinforcement > Fictitious Play > Equilibrium Healy ’06 –5 PG mechanisms, predicting convergence or not Feltovich ’00 –Unstable finite Bayesian game –Fit varies by game, error measure

7
Issue #3: Belief Learning If subjects are forming beliefs, measure them! Method 1: Direct elicitation –Incentivized guesses about s -i Method 2: Inferred from payoff table usage –Tracking payoff ‘lookups’ may inform our models

8
Previous Work Nyarko & Schotter ‘02 –Subjects BR to stated beliefs –Stated beliefs not too accurate Costa-Gomes, Crawford & Boseta ’01 –Mouselab to identify types –How players solve games, not learning

9
This Paper Pick an unstable infinite game Give subjects a calculator tool & track usage Elicit beliefs in some sessions Fit models to data in standard way Study formation of “beliefs” –“Beliefs” <= calculator tool –“Beliefs” <= elicited beliefs

10
The Game Walker’s PG mechanism for 3 players Added a ‘punishment’ parameter

11
Parameters & Equilibrium v i (y) = b i y – a i y 2 + c i Pareto optimum: y = 7.5 Unique PSNE: s i * = 2.5 Punishment γ = 0.1 Purpose: Not too wild, payoffs rarely negative Guessing Payoff: 10 – |g L - s L |/4 - |g R - s R |/4 Game Payoffs: Pr(<50) = 8.9% Pr(>100) = 71% aiai bibi cici 10.11.5110 20.23.0125 30.34.5140

12
Choice of Grid Size Grid Width5211/21/41/8 # Grid Points511214181161 % on Grid59.761.688.791.691.9 S = [-10,10]

13
Properties of the Game Best response: BR Dynamics: unstable –One eigenvalue is +2

14
Interface

15
Design PEEL Lab, U. Pittsburgh All Sessions –3 player groups, 50 periods –Same group, ID#s for all periods –Payoffs etc. common information –No explicit public good framing –Calculator always available –5 minute ‘warm-up’ with calculator Sessions 1-6 –Guess s L and s R. Sessions 7-13 –Baseline: no guesses.

16
Total Variation: – No significant difference (p=0.745) No. of Strategy Switches: –No significant difference (p=0.405) Autocorrelation (predictability): –Slightly more without elicitation Total Earnings per Session: –No significant difference (p=1) Missed Periods: –Elicited: 9/300 (3%) vs. Not: 3/350 (0.8%) Does Elicitation Affect Choice?

17
Does Play Converge? Average | s i – s i * | per Period Average | y – y o | per Period

18
Does Play Converge, Part 2

19
Accuracy of Beliefs Guesses get better in time Average || s -i – s -i (t) || per Period Elicited guessesCalculator inputs

20
Model 1: Parametric EWA δ : weight on strategy actually played φ : decay rate of past attractions ρ : decay rate of past experience A(0): initial attractions N(0): initial experience λ : response sensitivity to attractions

21
Model 1’: Self-Tuning EWA N(0) = 1 Replace δ and φ with deterministic functions:

22
STEWA: Setup Only remaining parameters: λ and A 0 –λ will be estimated –5 minutes of ‘Calculator Time’ gives A 0 Average payoff from calculator trials:

23
STEWA: Fit Likelihoods are ‘zero’ for all λ –Guess: Lots of near misses in predictions Alternative Measure: Quad. Scoring Rule –Best fit: λ = 0.04 (previous studies: λ>4) –Suggests attractions are very concentrated

26
STEWA: Adjustment Attempts The problem: near misses in strategy space, not in time Suggests: alter δ (weight on hypotheticals) –original specification : QSR* = 1.193 @ λ*=0.04 –δ = 0.7 (p-beauty est.): QSR* = 1.056 @ λ*=0.03 –δ = 1 (belief model): QSR* = 1.082 @ λ*=0.175 –δ(k,t) = % of B.R. payoff: QSR* = 1.077 @ λ*=0.06 Altering φ: –1/8 weight on surprises: QSR* = 1.228 @ λ*=0.04

27
STEWA: Other Modifications Equal initial attractions: worse Smoothing –Takes advantage of strategy space structure λ spreads probability across strategies evenly Smoothing spreads probability to nearby strategies –Smoothed Attractions –Smoothed Probabilities –But… No Improvement in QSR* or λ* ! Tentative Conclusion: –STEWA: not broken, or can’t be fixed…

28
Other Standard Models Nash Equilibrium Uniform Mixed Strategy (‘Random’) Logistic Cournot BR Deterministic Cournot BR Logistic Fictitious Play Deterministic Fictitious Play k-Period BR

29
“New” Models Best respond to stated beliefs (S1-S6 only) Best respond to calculator entries –Issue: how to aggregate calculator usage? –Decaying average of input Reinforcement based on calculator payoffs –Decaying average of payoffs

30
Model Comparisons MODELPARAMBIC2-QSRMADMSD Random Choice*N/AIn: InfiniteIn: 0.952 Out: 0.878 In: 7.439 Out: 7.816 In: 82.866 Out: 85.558 Logistic STEWA*λIn: InfiniteIn: 0.807 Out: 0.665 λ*=0.04 In: 3.818 Out: 3.180 λ*=0.41 In: 34.172 Out: 22.853 λ*=0.35 Logistic Cournot*λIn: InfiniteIn: 0.952 Out: 0.878 λ*=0.00(!) In: 4.222 Out: 3.557 λ*=4.30 In: 38.186 Out: 25.478 λ*=4.30 Logistic F.P.*λIn: InfiniteIn: 0.955 Out: 0.878 λ*=14.98 In: 4.265 Out: 3.891 λ*=4.47 In: 31.062 Out: 22.133 λ*=4.47 * Estimates on the grid of integers {-10,-9,…,9,10} In = periods 1-35 Out = periods 36-End

31
Model Comparisons 2 MODELPARAMMADMSD BR(Guesses) (6 sessions only) N/AIn: 5.5924 Out: 3.3693 In: 57.874 Out: 19.902 BR(Calculator Input)δ (=1/2)In: 6.394 Out: 8.263 In: 79.29 Out: 116.7 Calculator Reinforcement* δ (=1/2)In: 7.389 Out: 7.815 In: 82.407 Out: 85.495 k-Period BRkIn: 4.2126 Out: 3.582 k* = 4 In: 35.185 Out: 23.455 k* = 4 CournotN/AIn: 4.7974 Out: 3.857 In: 45.283 Out: 29.058 Weighted F.P.δIn: 4.500 Out: 3.518 δ* = 0.56 In: 38.290 Out: 22.426 δ * = 0.65

32
The “Take-Homes” Methodological issues –Infinite strategy space –Convergence vs. Instability –Right notion of error Self-Tuning EWA fits best. Guesses & calculator input don’t seem to offer any more predictive power… ?!?!

Similar presentations

OK

Calibrated Learning and Correlated Equilibrium By: Dean Foster and Rakesh Vohra Presented by: Jason Sorensen.

Calibrated Learning and Correlated Equilibrium By: Dean Foster and Rakesh Vohra Presented by: Jason Sorensen.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Poster template free download ppt on pollution Ppt on workplace etiquette powerpoint presentation Ppt on newton first law of motion Ppt on life study of mathematician euclid Ppt on business plan of hotel Product mix ppt on nestle chocolate Elements of one act play ppt on apple Ppt on directors under companies act 1956 Ppt on statistics and probability made Ppt on causes of world war 1