Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper

Outline 2 Motivation Background Bounded Memory Games Adaptive Regret Results

Motivating Example: Audit Game 3

Motivating Example: Cheating Game 4 Semester 1 Semester 2 Semester 3

Motivating Example: Speeding Game 5 Week 1 Week 2 Week 3

Motivating Example: Speeding Game Example Actions 6 Questions Appropriate Game Model for this Interaction? Defender Strategies? : Outcomes : High Inspection Low Inspection SpeedBehave

Elements of the Game Model 7 Two Players: Adversary (Tourists) and Defender (Policeman) Actions: Adversary Actions: {Speed, Behave} Defender Actions: {High Inspection, Low Inspection} Repeated Interactions Each interaction has an outcome History of game play is a sequence of outcomes Imperfect Information: The policeman doesnt always observe the actions of the tourist and vice versa Example

Game Elements 8 o Repeated Interaction o Two Players: Defender and Adversary o Imperfect Information o Defender only observes outcome o Short Term Adversaries o Adversary Incentives Unknown to Defender o Last presentation! [JNTP 13] o Adversary may be uninformed/irrational

Additional Game Elements 9 o History-dependent Actions o Adversary adapts behavior following unknown strategy o How should defender respond? o History-dependent Rewards: o Point System o Reputation of defender depends both on its history and on the current outcome Standard Regret Minimization Repeated Game Model?

Outline 10 Motivation Background Standard Definition of Regret Regret Minimization Algorithms Limitations Our Contributions Bounded Memory Games Adaptive Regret Results

Speeding Game: Repeated Game Model Example 11.190.7 0.21 High Inspection Low Inspection Defenders (D) Expected Utility

Speeding Game: Repeated Game Model Example 12.190.7 0.21 High Inspection Low Inspection Defenders (D) Expected Utility

Regret Minimization Example Example Experts 13 Low Inspection High Inspection What should I do?

Behave Low Behave High Speed High Low High Low High Low High Regret Minimization Example Example.190.7 0.21 High Inspection Low Inspection Defenders Utility Experts 14 Adversary Defender Utility 1.89 2.2 1.59 Aristotle Plato 0.19 + 1+ 0.7 = 1.89 0.2+ 1 + 1 = 2.2 Day 1 Day 2 Day 3

Regret Minimization Example Example.190.7 0.21 High Inspection Low Inspection Defenders Utility Regret 15 Defender Aristotle Plato Utility 1.89 2.2 0.59

Regret Minimization Example Example.190.7 0.21 High Inspection Low Inspection Defenders Utility Regret 16

Regret Minimization Example Example.190.7 0.21 High Inspection Low Inspection Defenders utility Regret Minimization Algorithm (A) 17

Regret Minimization: Basic Idea 18 Low Inspection High Inspection 1.0 Weights Choose action probabilistically based on weights.190.7 0.21 High Inspection Low Inspection

Regret Minimization: Basic Idea 19 Updated weights Low Inspection High Inspection 0.5 1.5.190.7 0.21 High Inspection Low Inspection

Speeding Game Example.190.7 0.21 High Inspection Low Inspection Defenders utility Defenders Strategy 20 Nash Equilibrium: Low Inspection Regret Minimization: Low Inspection Dominant Strategy Low Inspection High Inspection 0.5 1.5

Prior Work: Regret Minimization 23 Regret Minimization well studied in repeated games with imperfect information (bandit model) [AK04, McMahanB04,K05,FKM05, DH06,…] Regret Minimizing Audits [BCDS11]

Philosophical Argument 24 See! My advice was better! We need a better game model!

Speeding Game Example.190.7 0.21 High Inspection Low Inspection Defenders utility Adversarys utility 25 00.8 1 High Inspection Low Inspection Dominant Strategy/ Best Response

Speeding Game: Stackelberg Model Example.190.7 0.21 High Inspection Low Inspection Defenders utility Adversarys utility 26 00.8 1 High Inspection Low Inspection Stackelberg Strategy Best Response

Prior Work: Stackelberg Games and Security 27 Security Games [Tam12] [JPQ+],[JNTP13],… LAX, Air Marshals Audit Games [BCD+13]

Philosophical Argument 28 Your Stackelberg game model is still flawed! See! My advice was really better!

Unmodeled Game Elements 29 o Adversary Incentives Unknown to Defender o Last presentation! [JNTP 13] o Adversary may be uninformed/irrational o History-dependent Rewards: o Point System o Reputation of defender depends both on its history and on the current outcome o History-dependent Actions o Adversary adapts behavior following unknown strategy o How should defender respond?

Outline 30 Motivation Background Our Contributions Bounded Memory Games Adaptive Regret Results

Stochastic Games 31 States: captures dependence of rewards on history s0s0 s1s1 s2s2 Thm: No algorithm can minimize regret for the general class of stochastic games.

Bounded Memory Games 32 State s: Encodes last m outcomes States: can capture history dependent rewards

Bounded Memory Games 33

Bounded Memory Games - Experts 34 Expert advice may depend on the last m outcomes If no violations have been detected in the last m rounds then play High Inspection, otherwise Low Inspection State Action

Outline 35 Motivation Background Our Contributions Bounded Memory Games Adaptive Regret Results

k-Adaptive Strategy 36 Decision tree for the next k rounds Speed Day 1 Day 2 Day 3 Behave Speed

k-Adaptive Strategy 37 Decision tree for the next k rounds Week 1 Week 2 Week 3 I will never speed while I am on vacation. I will speed until I get caught. If I ever get a ticket then I will stop. I will keep speeding until I get two tickets. If I ever get two tickets then I will stop.

k-Adaptive Regret 38 Initial State Defender…O -1 O0O0 Actions(a 1,d 1 )(a 2,d 2 )…(a k+1,d k+1 ) OutcomeO1O1 O2O2 …O k+1 … r1r1 r2r2 …r k+1 Expert…O -1 O0O0 Actions(a 1,d 1 )(a 2,d 2 )…(a k+1,d k+1 )… OutcomeO 1 O 2 …O k+1 … r 1 r 2 …r k+1

k-Adaptive Regret Minimization 39

Outline 40 Motivation Background Bounded Memory Games Adaptive Regret Results

k-Adaptive Regret Minimization 41

Inefficient Regret Minimization Algorithm 42 Use standard regret minimization algorithm for repeated games of imperfect information [AK04, McMahanB04,K05,FKM05] … f1f1 f2f2 … Bounded Memory-m Game Repeated Game

Inefficient Regret Minimization Algorithm 43 … f1f1 f2f2 … Bounded Memory-m Game Repeated Game Expected reward in original game given: 1.Defender follows fixed strategy f 2 for next mkt rounds of original game 2.Defender sees sequence of k- adaptive adversaries below Use standard regret minimization algorithm for repeated games of imperfect information [AK04, McMahanB04,K05,FKM05]

Inefficient Regret Minimization Algorithm Start StateStage i (m*k*t rounds) Real Game…O1O1 …OmOm … Repeated Game…O1O1 …OmOm … 44

Inefficient Regret Minimization Algorithm Start StateStage i (m*k*t rounds) Real Game…O1O1 …OmOm … Repeated Game…O1O1 …OmOm … 45

Inefficient Regret Minimization Algorithm 46 … f1f1 f2f2 … Bounded Memory-m Game Repeated Game Standard Regret Minimization algorithms maintain weight for each expert. Inefficient: Exponentially many fixed strategies! Use standard regret minimization algorithm for repeated games of imperfect information [AK04, McMahanB04,K05,FKM05]

Summary of Technical Results 47 Imperfect Information Perfect Information Hard (Theorem 1) APX (Theorem 5) APX (Theorem 4) Hard (Theorem 1)Hard (Remark 2) X (Theorem 6) Easier X – No Regret Minimization Algorithm Exists APX – efficient approximate regret minimization algorithm

Summary of Technical Results 48 Imperfect Information Perfect Information Hard (Theorem 1) APX (Theorem 5) APX (Theorem 4) Hard (Theorem 1) APX (New!) Hard (Remark 2) APX (New!) X (Theorem 6) Easier X – No Regret Minimization Algorithm Exists APX – efficient approximate regret minimization algorithm in n, k

Summary of Technical Results 49 Imperfect Information Perfect Information X (Theorem 6) Easier Ideas: Implicit weight representation + Dynamic Programming Warning! f(k) is a very large constant!

Implicit Weights: Outcome Tree 50 Behave Speed Behave Speed How often is edge (s,t) relevant?

Implicit Weights: Outcome Tree 51 Expert: E Behave Speed Behave Speed

Open Questions 52 Imperfect Information Perfect Information Hard (Theorem 1) APX (Theorem 5) APX (Theorem 4) Hard (Theorem 1) APX Hard (Remark 2) APX X (Theorem 6) Thanks for Listening!

T HEOREM 3 Unless RP=NP there is no efficient Regret Minimization algorithm for Bounded Memory Games even against an oblivious adversary. Reduction from MAX 3-SAT (7/8+ε) [Hastad01] Similar to reduction in [EKM05] for MDPs 54

T HEOREM 3: S ETUP Defender Actions A: {0,1}x{0,1} m = O(log n) States: Two states for each variable S 0 = {s 1,…, s n } S 1 = {s 1,…,s n } Intuition: A fixed strategy corresponds to a variable assignment 55

T HEOREM 3: O VERVIEW The adversary picks a clause uniformly at random for the next n rounds Defender can earn reward 1 by satisfying this unknown clause in the next n rounds The game will remember if a reward has already been given so that defender cannot earn a reward multiple times during n rounds 56

T HEOREM 3: S TATE T RANSITIONS 57 Adversary Actions B: {0,1}x{0,1,2,3} b = (b 1, b 2 ) g(a,b) = b 1 f(a,b) = S 1 if a 2 = 1 or b 2 = a 1 (reward already given) S 0 else (no reward given)

T HEOREM 3: R EWARDS 58 b = (b 1, b 2 ) No reward whenever B plays b 2 = 2 r(a,b,s) = 1 if s S 0 and a = b 2 -5 if s S 1 and f(a,b) = S 0 and b 2 3 0 otherwise No reward whenever s S 1

T HEOREM 3: O BLIVIOUS A DVERSARY (d 1,…,d n ) - binary De Buijn sequence of order n 1. Pick a clause C uniformly at random 2. For i = 1,…,n Play b = (d i,b 2 ) 3. RepeatStep 1 59 b 2 = 1If x i C 0 3If i = n 2Otherwise

A NALYSIS Defender can never be rewarded from s S 1 Get Reward => Transition to s S 1 Defender is punished for leaving S 1 Unless adversary plays b 2 = 3 (i.e when i = n) 60 f(a,b) = S 1 if a 2 = 1 or b 2 = a 1 S 0 else r(a,b,s)= 1 if s S 0 and a = b 2 -5 if s S 1 and f(a,b) = S 0 and b 2 3 0 otherwise

T HEOREM 3: A NALYSIS φ - assignment satisfying ρ fraction of clauses f φ – average score ρ/n Claim: No strategy (fixed or adaptive) can obtain an average expected score better than ρ*/n Regret Minimization Algorithm Run until expected average regret < ε/n Expected average score > (ρ*- ε )/n 61

Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Similar presentations

Presentation on theme: "Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Similar presentations

Presentation on theme: "Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper."— Presentation transcript:

Similar presentations

About project

Feedback