Download presentation

Presentation is loading. Please wait.

Published byEfren Baggett Modified over 2 years ago

1
Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper

2
Outline 2 Motivation Background Bounded Memory Games Adaptive Regret Results

3
Motivating Example: Audit Game 3

4
Motivating Example: Cheating Game 4 Semester 1 Semester 2 Semester 3

5
Motivating Example: Speeding Game 5 Week 1 Week 2 Week 3

6
Motivating Example: Speeding Game Example Actions 6 Questions Appropriate Game Model for this Interaction? Defender Strategies? : Outcomes : High Inspection Low Inspection SpeedBehave

7
Elements of the Game Model 7 Two Players: Adversary (Tourists) and Defender (Policeman) Actions: Adversary Actions: {Speed, Behave} Defender Actions: {High Inspection, Low Inspection} Repeated Interactions Each interaction has an outcome History of game play is a sequence of outcomes Imperfect Information: The policeman doesnt always observe the actions of the tourist and vice versa Example

8
Game Elements 8 o Repeated Interaction o Two Players: Defender and Adversary o Imperfect Information o Defender only observes outcome o Short Term Adversaries o Adversary Incentives Unknown to Defender o Last presentation! [JNTP 13] o Adversary may be uninformed/irrational

9
Additional Game Elements 9 o History-dependent Actions o Adversary adapts behavior following unknown strategy o How should defender respond? o History-dependent Rewards: o Point System o Reputation of defender depends both on its history and on the current outcome Standard Regret Minimization Repeated Game Model?

10
Outline 10 Motivation Background Standard Definition of Regret Regret Minimization Algorithms Limitations Our Contributions Bounded Memory Games Adaptive Regret Results

11
Speeding Game: Repeated Game Model Example High Inspection Low Inspection Defenders (D) Expected Utility

12
Speeding Game: Repeated Game Model Example High Inspection Low Inspection Defenders (D) Expected Utility

13
Regret Minimization Example Example Experts 13 Low Inspection High Inspection What should I do?

14
Behave Low Behave High Speed High Low High Low High Low High Regret Minimization Example Example High Inspection Low Inspection Defenders Utility Experts 14 Adversary Defender Utility Aristotle Plato = = 2.2 Day 1 Day 2 Day 3

15
Regret Minimization Example Example High Inspection Low Inspection Defenders Utility Regret 15 Defender Aristotle Plato Utility

16
Regret Minimization Example Example High Inspection Low Inspection Defenders Utility Regret 16

17
Regret Minimization Example Example High Inspection Low Inspection Defenders utility Regret Minimization Algorithm (A) 17

18
Regret Minimization: Basic Idea 18 Low Inspection High Inspection 1.0 Weights Choose action probabilistically based on weights High Inspection Low Inspection

19
Regret Minimization: Basic Idea 19 Updated weights Low Inspection High Inspection High Inspection Low Inspection

20
Speeding Game Example High Inspection Low Inspection Defenders utility Defenders Strategy 20 Nash Equilibrium: Low Inspection Regret Minimization: Low Inspection Dominant Strategy Low Inspection High Inspection

21
Speeding Game Example High Inspection Low Inspection Defenders utility Defenders Strategy 21 Nash Equilibrium: Low Inspection Regret Minimization: Low Inspection Dominant Strategy Low Inspection High Inspection

22
Speeding Game Example High Inspection Low Inspection Defenders utility Defenders Strategy 22 Nash Equilibrium: Low Inspection Regret Minimization: Low Inspection Dominant Strategy Low Inspection High Inspection

23
Prior Work: Regret Minimization 23 Regret Minimization well studied in repeated games with imperfect information (bandit model) [AK04, McMahanB04,K05,FKM05, DH06,…] Regret Minimizing Audits [BCDS11]

24
Philosophical Argument 24 See! My advice was better! We need a better game model!

25
Speeding Game Example High Inspection Low Inspection Defenders utility Adversarys utility High Inspection Low Inspection Dominant Strategy/ Best Response

26
Speeding Game: Stackelberg Model Example High Inspection Low Inspection Defenders utility Adversarys utility High Inspection Low Inspection Stackelberg Strategy Best Response

27
Prior Work: Stackelberg Games and Security 27 Security Games [Tam12] [JPQ+],[JNTP13],… LAX, Air Marshals Audit Games [BCD+13]

28
Philosophical Argument 28 Your Stackelberg game model is still flawed! See! My advice was really better!

29
Unmodeled Game Elements 29 o Adversary Incentives Unknown to Defender o Last presentation! [JNTP 13] o Adversary may be uninformed/irrational o History-dependent Rewards: o Point System o Reputation of defender depends both on its history and on the current outcome o History-dependent Actions o Adversary adapts behavior following unknown strategy o How should defender respond?

30
Outline 30 Motivation Background Our Contributions Bounded Memory Games Adaptive Regret Results

31
Stochastic Games 31 States: captures dependence of rewards on history s0s0 s1s1 s2s2 Thm: No algorithm can minimize regret for the general class of stochastic games.

32
Bounded Memory Games 32 State s: Encodes last m outcomes States: can capture history dependent rewards

33
Bounded Memory Games 33

34
Bounded Memory Games - Experts 34 Expert advice may depend on the last m outcomes If no violations have been detected in the last m rounds then play High Inspection, otherwise Low Inspection State Action

35
Outline 35 Motivation Background Our Contributions Bounded Memory Games Adaptive Regret Results

36
k-Adaptive Strategy 36 Decision tree for the next k rounds Speed Day 1 Day 2 Day 3 Behave Speed

37
k-Adaptive Strategy 37 Decision tree for the next k rounds Week 1 Week 2 Week 3 I will never speed while I am on vacation. I will speed until I get caught. If I ever get a ticket then I will stop. I will keep speeding until I get two tickets. If I ever get two tickets then I will stop.

38
k-Adaptive Regret 38 Initial State Defender…O -1 O0O0 Actions(a 1,d 1 )(a 2,d 2 )…(a k+1,d k+1 ) OutcomeO1O1 O2O2 …O k+1 … r1r1 r2r2 …r k+1 Expert…O -1 O0O0 Actions(a 1,d 1 )(a 2,d 2 )…(a k+1,d k+1 )… OutcomeO 1 O 2 …O k+1 … r 1 r 2 …r k+1

39
k-Adaptive Regret Minimization 39

40
Outline 40 Motivation Background Bounded Memory Games Adaptive Regret Results

41
k-Adaptive Regret Minimization 41

42
Inefficient Regret Minimization Algorithm 42 Use standard regret minimization algorithm for repeated games of imperfect information [AK04, McMahanB04,K05,FKM05] … f1f1 f2f2 … Bounded Memory-m Game Repeated Game

43
Inefficient Regret Minimization Algorithm 43 … f1f1 f2f2 … Bounded Memory-m Game Repeated Game Expected reward in original game given: 1.Defender follows fixed strategy f 2 for next mkt rounds of original game 2.Defender sees sequence of k- adaptive adversaries below Use standard regret minimization algorithm for repeated games of imperfect information [AK04, McMahanB04,K05,FKM05]

44
Inefficient Regret Minimization Algorithm Start StateStage i (m*k*t rounds) Real Game…O1O1 …OmOm … Repeated Game…O1O1 …OmOm … 44

45
Inefficient Regret Minimization Algorithm Start StateStage i (m*k*t rounds) Real Game…O1O1 …OmOm … Repeated Game…O1O1 …OmOm … 45

46
Inefficient Regret Minimization Algorithm 46 … f1f1 f2f2 … Bounded Memory-m Game Repeated Game Standard Regret Minimization algorithms maintain weight for each expert. Inefficient: Exponentially many fixed strategies! Use standard regret minimization algorithm for repeated games of imperfect information [AK04, McMahanB04,K05,FKM05]

47
Summary of Technical Results 47 Imperfect Information Perfect Information Hard (Theorem 1) APX (Theorem 5) APX (Theorem 4) Hard (Theorem 1)Hard (Remark 2) X (Theorem 6) Easier X – No Regret Minimization Algorithm Exists APX – efficient approximate regret minimization algorithm

48
Summary of Technical Results 48 Imperfect Information Perfect Information Hard (Theorem 1) APX (Theorem 5) APX (Theorem 4) Hard (Theorem 1) APX (New!) Hard (Remark 2) APX (New!) X (Theorem 6) Easier X – No Regret Minimization Algorithm Exists APX – efficient approximate regret minimization algorithm in n, k

49
Summary of Technical Results 49 Imperfect Information Perfect Information X (Theorem 6) Easier Ideas: Implicit weight representation + Dynamic Programming Warning! f(k) is a very large constant!

50
Implicit Weights: Outcome Tree 50 Behave Speed Behave Speed How often is edge (s,t) relevant?

51
Implicit Weights: Outcome Tree 51 Expert: E Behave Speed Behave Speed

52
Open Questions 52 Imperfect Information Perfect Information Hard (Theorem 1) APX (Theorem 5) APX (Theorem 4) Hard (Theorem 1) APX Hard (Remark 2) APX X (Theorem 6) Thanks for Listening!

53
53

54
T HEOREM 3 Unless RP=NP there is no efficient Regret Minimization algorithm for Bounded Memory Games even against an oblivious adversary. Reduction from MAX 3-SAT (7/8+ε) [Hastad01] Similar to reduction in [EKM05] for MDPs 54

55
T HEOREM 3: S ETUP Defender Actions A: {0,1}x{0,1} m = O(log n) States: Two states for each variable S 0 = {s 1,…, s n } S 1 = {s 1,…,s n } Intuition: A fixed strategy corresponds to a variable assignment 55

56
T HEOREM 3: O VERVIEW The adversary picks a clause uniformly at random for the next n rounds Defender can earn reward 1 by satisfying this unknown clause in the next n rounds The game will remember if a reward has already been given so that defender cannot earn a reward multiple times during n rounds 56

57
T HEOREM 3: S TATE T RANSITIONS 57 Adversary Actions B: {0,1}x{0,1,2,3} b = (b 1, b 2 ) g(a,b) = b 1 f(a,b) = S 1 if a 2 = 1 or b 2 = a 1 (reward already given) S 0 else (no reward given)

58
T HEOREM 3: R EWARDS 58 b = (b 1, b 2 ) No reward whenever B plays b 2 = 2 r(a,b,s) = 1 if s S 0 and a = b 2 -5 if s S 1 and f(a,b) = S 0 and b otherwise No reward whenever s S 1

59
T HEOREM 3: O BLIVIOUS A DVERSARY (d 1,…,d n ) - binary De Buijn sequence of order n 1. Pick a clause C uniformly at random 2. For i = 1,…,n Play b = (d i,b 2 ) 3. RepeatStep 1 59 b 2 = 1If x i C 0 3If i = n 2Otherwise

60
A NALYSIS Defender can never be rewarded from s S 1 Get Reward => Transition to s S 1 Defender is punished for leaving S 1 Unless adversary plays b 2 = 3 (i.e when i = n) 60 f(a,b) = S 1 if a 2 = 1 or b 2 = a 1 S 0 else r(a,b,s)= 1 if s S 0 and a = b 2 -5 if s S 1 and f(a,b) = S 0 and b otherwise

61
T HEOREM 3: A NALYSIS φ - assignment satisfying ρ fraction of clauses f φ – average score ρ/n Claim: No strategy (fixed or adaptive) can obtain an average expected score better than ρ*/n Regret Minimization Algorithm Run until expected average regret < ε/n Expected average score > (ρ*- ε )/n 61

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google