Download presentation

Presentation is loading. Please wait.

Published byTayler Frome Modified about 1 year ago

1
Belief Learning in an Unstable Infinite Game Paul J. Healy CMU

2
Belief Learning in an Unstable Infinite Game Issue #3 Issue #1 Issue #2

3
Issue #1: Infinite Games Typical Learning Model: –Finite set of strategies –Strategies get weight based on ‘fitness’ –Bells & Whistles: experimentation, spillovers… Many important games have infinite strategies –Duopoly, PG, bargaining, auctions, war of attrition… Quality of fit sensitive to grid size? Models don’t use strategy space structure

4
Previous Work Grid size on fit quality: –Arifovic & Ledyard Groves-Ledyard mechanisms Convergence failure of RL with |S| = 51 Strategy space structure: –Roth & Erev AER ’99 Quality-of-fit/error measures –What’s the right metric space? Closeness in probs. or closeness in strategies?

5
Issue #2: Unstable Game Usually predicting convergence rates –Example: p–beauty contests Instability: –Toughest test for learning models –Most statistical power

6
Previous Work Chen & Tang ‘98 –Walker mechanism & unstable Groves-Ledyard –Reinforcement > Fictitious Play > Equilibrium Healy ’06 –5 PG mechanisms, predicting convergence or not Feltovich ’00 –Unstable finite Bayesian game –Fit varies by game, error measure

7
Issue #3: Belief Learning If subjects are forming beliefs, measure them! Method 1: Direct elicitation –Incentivized guesses about s -i Method 2: Inferred from payoff table usage –Tracking payoff ‘lookups’ may inform our models

8
Previous Work Nyarko & Schotter ‘02 –Subjects BR to stated beliefs –Stated beliefs not too accurate Costa-Gomes, Crawford & Boseta ’01 –Mouselab to identify types –How players solve games, not learning

9
This Paper Pick an unstable infinite game Give subjects a calculator tool & track usage Elicit beliefs in some sessions Fit models to data in standard way Study formation of “beliefs” –“Beliefs” <= calculator tool –“Beliefs” <= elicited beliefs

10
The Game Walker’s PG mechanism for 3 players Added a ‘punishment’ parameter

11
Parameters & Equilibrium v i (y) = b i y – a i y 2 + c i Pareto optimum: y = 7.5 Unique PSNE: s i * = 2.5 Punishment γ = 0.1 Purpose: Not too wild, payoffs rarely negative Guessing Payoff: 10 – |g L - s L |/4 - |g R - s R |/4 Game Payoffs: Pr(<50) = 8.9% Pr(>100) = 71% aiai bibi cici 10.11.5110 20.23.0125 30.34.5140

12
Choice of Grid Size Grid Width5211/21/41/8 # Grid Points511214181161 % on Grid59.761.688.791.691.9 S = [-10,10]

13
Properties of the Game Best response: BR Dynamics: unstable –One eigenvalue is +2

14
Interface

15
Design PEEL Lab, U. Pittsburgh All Sessions –3 player groups, 50 periods –Same group, ID#s for all periods –Payoffs etc. common information –No explicit public good framing –Calculator always available –5 minute ‘warm-up’ with calculator Sessions 1-6 –Guess s L and s R. Sessions 7-13 –Baseline: no guesses.

16
Total Variation: – No significant difference (p=0.745) No. of Strategy Switches: –No significant difference (p=0.405) Autocorrelation (predictability): –Slightly more without elicitation Total Earnings per Session: –No significant difference (p=1) Missed Periods: –Elicited: 9/300 (3%) vs. Not: 3/350 (0.8%) Does Elicitation Affect Choice?

17
Does Play Converge? Average | s i – s i * | per Period Average | y – y o | per Period

18
Does Play Converge, Part 2

19
Accuracy of Beliefs Guesses get better in time Average || s -i – s -i (t) || per Period Elicited guessesCalculator inputs

20
Model 1: Parametric EWA δ : weight on strategy actually played φ : decay rate of past attractions ρ : decay rate of past experience A(0): initial attractions N(0): initial experience λ : response sensitivity to attractions

21
Model 1’: Self-Tuning EWA N(0) = 1 Replace δ and φ with deterministic functions:

22
STEWA: Setup Only remaining parameters: λ and A 0 –λ will be estimated –5 minutes of ‘Calculator Time’ gives A 0 Average payoff from calculator trials:

23
STEWA: Fit Likelihoods are ‘zero’ for all λ –Guess: Lots of near misses in predictions Alternative Measure: Quad. Scoring Rule –Best fit: λ = 0.04 (previous studies: λ>4) –Suggests attractions are very concentrated

24

25

26
STEWA: Adjustment Attempts The problem: near misses in strategy space, not in time Suggests: alter δ (weight on hypotheticals) –original specification : QSR* = 1.193 @ λ*=0.04 –δ = 0.7 (p-beauty est.): QSR* = 1.056 @ λ*=0.03 –δ = 1 (belief model): QSR* = 1.082 @ λ*=0.175 –δ(k,t) = % of B.R. payoff: QSR* = 1.077 @ λ*=0.06 Altering φ: –1/8 weight on surprises: QSR* = 1.228 @ λ*=0.04

27
STEWA: Other Modifications Equal initial attractions: worse Smoothing –Takes advantage of strategy space structure λ spreads probability across strategies evenly Smoothing spreads probability to nearby strategies –Smoothed Attractions –Smoothed Probabilities –But… No Improvement in QSR* or λ* ! Tentative Conclusion: –STEWA: not broken, or can’t be fixed…

28
Other Standard Models Nash Equilibrium Uniform Mixed Strategy (‘Random’) Logistic Cournot BR Deterministic Cournot BR Logistic Fictitious Play Deterministic Fictitious Play k-Period BR

29
“New” Models Best respond to stated beliefs (S1-S6 only) Best respond to calculator entries –Issue: how to aggregate calculator usage? –Decaying average of input Reinforcement based on calculator payoffs –Decaying average of payoffs

30
Model Comparisons MODELPARAMBIC2-QSRMADMSD Random Choice*N/AIn: InfiniteIn: 0.952 Out: 0.878 In: 7.439 Out: 7.816 In: 82.866 Out: 85.558 Logistic STEWA*λIn: InfiniteIn: 0.807 Out: 0.665 λ*=0.04 In: 3.818 Out: 3.180 λ*=0.41 In: 34.172 Out: 22.853 λ*=0.35 Logistic Cournot*λIn: InfiniteIn: 0.952 Out: 0.878 λ*=0.00(!) In: 4.222 Out: 3.557 λ*=4.30 In: 38.186 Out: 25.478 λ*=4.30 Logistic F.P.*λIn: InfiniteIn: 0.955 Out: 0.878 λ*=14.98 In: 4.265 Out: 3.891 λ*=4.47 In: 31.062 Out: 22.133 λ*=4.47 * Estimates on the grid of integers {-10,-9,…,9,10} In = periods 1-35 Out = periods 36-End

31
Model Comparisons 2 MODELPARAMMADMSD BR(Guesses) (6 sessions only) N/AIn: 5.5924 Out: 3.3693 In: 57.874 Out: 19.902 BR(Calculator Input)δ (=1/2)In: 6.394 Out: 8.263 In: 79.29 Out: 116.7 Calculator Reinforcement* δ (=1/2)In: 7.389 Out: 7.815 In: 82.407 Out: 85.495 k-Period BRkIn: 4.2126 Out: 3.582 k* = 4 In: 35.185 Out: 23.455 k* = 4 CournotN/AIn: 4.7974 Out: 3.857 In: 45.283 Out: 29.058 Weighted F.P.δIn: 4.500 Out: 3.518 δ* = 0.56 In: 38.290 Out: 22.426 δ * = 0.65

32
The “Take-Homes” Methodological issues –Infinite strategy space –Convergence vs. Instability –Right notion of error Self-Tuning EWA fits best. Guesses & calculator input don’t seem to offer any more predictive power… ?!?!

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google