Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

Similar presentations

Presentation on theme: "1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California."— Presentation transcript:

1 1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California

2 2 University of Southern California Motivation: The Prediction Game Police vehicle  Patrols 4 regions Can you predict the patrol pattern ? Pattern 1 Pattern 2 Randomization decreases Predictability Increases Security Region 1Region 2 Region 3Region 4

3 3 University of Southern California Domains Police patrolling groups of houses Scheduled activities at airports like security check, refueling etc  Adversary monitors activities  Randomized policies

4 4 University of Southern California Problem Definition Problem : Security for agents in uncertain adversarial domains Assumptions for Agent/agent-team:  Variable information about adversary –Adversary cannot be modeled (Part 1) n Action/payoff structure unavailable –Adversary is partially modeled (Part 2) n Probability distribution over adversaries Assumptions for Adversary:  Knows agents plan/policy  Exploits the action predictability

5 5 University of Southern California Outline Security via Randomization No Adversary ModelPartial Adversary Model Randomization + Quality Constraints MDP/Dec-POMDP Mixed strategies: Bayesian Stackelberg Games Contributions: New, Efficient Algorithms

6 6 University of Southern California No Adversary Model: Solution Technique Intentional policy randomization for security  Information Minimization Game  MDP/POMDP: Sequential decision making under uncertainty –POMDP  Partially Observable Markov Decision Process Maintain Quality Constraints  Resource constraints (Time, Fuel etc)  Frequency constraints (Likelihood of crime, Property Value)

7 7 University of Southern California Randomization with quality constraints Fuel used < Threshold

8 8 University of Southern California No Adversary Model: Contributions Two main contributions  Single Agent Case: –Nonlinear program: Entropy based metric n Hard to solve (Exponential) –Convert to Linear Program: BRLP (Binary search for randomization)  Multi Agent Case: RDR (Rolling Down Randomization) –Randomized policies for decentralized POMDPs

9 9 University of Southern California MDP based single agent case MDP is tuple  S – Set of states  A – Set of actions  P – Transition function  R – Reward function Basic terms used :  x(s,a) : Expected times action a is taken in state s  Policy (as function of MDP flows) :

10 10 University of Southern California Entropy : Measure of randomness Randomness or information content quantified using Entropy ( Shannon 1948 ) Entropy for MDP -  Additive Entropy – Add entropies of each state  Weighted Entropy – Weigh each state by it contribution to total flow

11 11 University of Southern California Randomized Policy Generation Non-linear Program: Max entropy, Reward above threshold  Exponential Algorithm Linearize: Obtain Poly-time Algorithm  BRLP (Binary Search for Randomization LP)  Entropy as function of flows

12 12 University of Southern California BRLP: Efficient Randomized Policy Inputs: and target reward can be any high entropy policy (uniform policy) LP for BRLP Entropy control with

13 13 University of Southern California BRLP in Action = 1 - Max entropy = 0 Deterministic Max Reward Target Reward Beta =.5 Increasing scale of

14 14 University of Southern California Results (Averaged over 10 MDPs) For a given reward threshold, Highest entropy : Weighted Entropy : 10% avg gain over BRLP Fastest : BRLP : 7 fold average speedup over Expected Entropy

15 15 University of Southern California Multi Agent Case: Problem Maximize entropy for agent teams subject to reward threshold For agent team:  Decentralized POMDP framework  No communication between agents For adversary:  Knows the agents policy  Exploits the action predictability

16 16 University of Southern California Policy trees : Deterministic vs Randomized A1 A2 O1 O2 O1 O2 O1 A1A2 A1A2 A1A2 O1 O2 O1 O2 Deterministic Policy Tree Randomized Policy Tree

17 17 University of Southern California RDR : Rolling Down Randomization Input :  Best ( local or global ) deterministic policy  Percent of reward loss  d parameter – Number of turns each agent gets –Ex: d =.5 => Number of steps = 1/d = 2 –Each agent gets one turn ( for 2 agent case ) –Single agent MDP problem at each step

18 18 University of Southern California RDR : d =.5 M = Max Reward 80% of M Agent 1 Fix Agent 2’s policy Maximize joint entropy Joint Reward > 90% 90% of M Agent 2 Fix Agent 1’s policy Maximize joint entropy Joint reward > 80%

19 19 University of Southern California RDR Details To derive single agent MDP:  New Transition, Observation and Belief Update rules needed Original Belief Update Rule – New Belief Update Rule –

20 20 University of Southern California Experimental Results : Reward Threshold vs Weighted Entropy ( Averaged 10 instances )

21 21 University of Southern California Security with Partial Adversary Modeled Police agent patrolling a region. Many adversaries (robbers)  Different motivations, different times and places Model (Action & Payoff) of each adversary known Probability distribution known over adversaries Modeled as Bayesian Stackelberg game

22 22 University of Southern California Bayesian Game It contains:  Set of agents: N (Police and robbers)  A set of types θm (Police and robber types)  Set of strategies σi for each agent i  Probability distribution over types Пj: θj  [0,1]  Utility function: Ui : θ1 * θ2 * σ1 * σ2  R

23 23 University of Southern California Stackelberg Game Agent as leader  Commits to strategy first: Patrol policy Adversaries as followers  Optimize against leaders fixed strategy –Observe patrol patterns to leverage information Nash Equilibrium: : [2,1] Leader commits to uniform random strategy {.5,.5} Follower plays b: [3.5,1] ab a2,14,0 b1,03,2 Agent Adversary

24 24 University of Southern California Previous work: Conitzer, Sandholm AAAI’05, EC’06 MIP-Nash (AAAI’05): Efficient best Nash procedure Multiple LPs Method (EC’06): Given normal form game  Finds optimal leader strategy to commit to Bayesian to Normal Form Game  Harsanyi Transformation: Exponential adversary strategies  NP-hard For every joint pure strategy j of adversary: (R, C: Agent, Adversary)

25 25 University of Southern California Bayesian Stackelberg Game: Approach Two Approaches: 1. Heuristic solution  ASAP: Agent Security via Approximate Policies 2. Exact Solution  DOBSS: Decomposed Optimal Bayesian Stackelberg Solver Exponential savings –No Harsanyi Transformation –No exponential # of LP’s n One MILP program (Mixed Integer Linear Program)

26 26 University of Southern California ASAP vs DOBSS ASAP: Heuristic  Control probability of strategy –Discrete probability space  Generates k-uniform policies –k = 3 => Probability = {0, 1/3, 2/3, 1} –Simple and easy to implement DOBSS: Exact  Modify ASAP Algorithm  Discrete to continuous probability space  Focus of rest of talk

27 27 University of Southern California DOBSS Details Previous work:  Fix adversary (joint) pure strategy  Solve LP to find best agent strategy My approach:  For each agent mixed strategy  Find adversary best response Advantages:  Decomposition technique –Given agent strategy n Each adversary can find Best-response independently  Mathematical technique obtains single MILP

28 28 University of Southern California Obtaining MILP Decomposing Substitute

29 29 University of Southern California Experiments: Domain Patrolling Domain: Security agent and robber Security agent patrols houses  Ex: Visit house a Observe house and its neighbor  Plan for patrol length 2  6 or 12 strategies : 3 or 4 houses Robbers can attack any house  3 possible choices for 3 houses  Reward dependent on house and agent position  Joint space of robbers exponential – strategies: 3 houses, 10 robbers

30 30 University of Southern California Sample Patrolling Domain: 3 & 4 houses 3 houses LPs: 7 followers DOBSS: 20 4 houses LP’s: 6 followers DOBSS: 12

31 31 University of Southern California Conclusion Agent cannot model adversary  Intentional randomization algorithms for MDP/Dec-POMDP Agent has partial model of adversary  Efficient MILP solution for Bayesian Stackelberg games

32 32 University of Southern California Vision Incorporating machine learning  Dynamic environments Resource constrained agents  Constraints might be unknown in advance Developing real world applications  Police patrolling, Airport security

33 33 University of Southern California Thank You Any comments/questions ?

34 34 University of Southern California

Download ppt "1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California."

Similar presentations

Ads by Google