Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents Prasanna Velagapudi Pradeep Varakantham Paul Scerri Katia Sycara.

Similar presentations


Presentation on theme: "Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents Prasanna Velagapudi Pradeep Varakantham Paul Scerri Katia Sycara."— Presentation transcript:

1 Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents Prasanna Velagapudi Pradeep Varakantham Paul Scerri Katia Sycara 1D-TREMOR - AAMAS2011

2 Motivation 100s to 1000s of robots, agents, people Complex, collaborative tasks Dynamic, uncertain environment Offline planning D-TREMOR - AAMAS20112

3 Motivation Exploit three characteristics of these domains 1. Explicit Interactions Specific combinations of states and actions where effects depend on more than one agent 2. Sparsity of Interactions Many potential interactions could occur between agents Only a few will occur in any given solution 3. Distributed Computation Each agent has access to local computation A centralized algorithm has access to 1 unit of computation A distributed algorithm has access to N units of computation D-TREMOR - AAMAS20113

4 Review: Dec-POMDP D-TREMOR - AAMAS2011 : Joint Transition : Joint Reward : Joint Observation 4

5 D-TREMOR - AAMAS20115 Distributed POMDP with Coordination Locales [Varakantham, et al 2009] CL = Nature of time constraint (e.g. affects only same- time, affects any future- time) Relevant region of joint state-action space Time constraint

6 CL = D-TREMOR - AAMAS2011 : : 6 Distributed POMDP with Coordination Locales [Varakantham, et al 2009]

7 Decentralized auction EVA POMDP solver Policy sub-sampling and Coordination Locale (CL) messages Policy sub-sampling and Coordination Locale (CL) messages Prioritized/randomized reward and transition shaping Prioritized/randomized reward and transition shaping D-TREMOR (extending TREMOR [Varakantham, et al 2009]) D-TREMOR - AAMAS2011 Task Allocation Local Planning Interaction Exchange Model Shaping 7

8 D-TREMOR: Task Allocation Assign “tasks” using decentralized auction – Greedy, nearest allocation Create local, independent sub-problem: D-TREMOR - AAMAS20118

9 D-TREMOR: Local Planning Solve using off-the-shelf algorithm (EVA) Result: locally-optimal policies D-TREMOR - AAMAS20119

10 D-TREMOR: Interaction Exchange Find Pr CLi and Val CLi : Send CL messages to teammates: D-TREMOR - AAMAS2011 No collision Collision Val CLi = -7 [Kearns 2002] 10 Entered corridor in 95 of 100 runs: Pr CLi = 0.95

11 D-TREMOR: Model Shaping Shape local model rewards/transitions based on interactions D-TREMOR - AAMAS201111 Probability of interaction Interaction model functions Independent model functions 11

12 D-TREMOR: Local Planning (again) Re-solve shaped local models to get new policies Result: new locally-optimal policies  new interactions D-TREMOR - AAMAS201112

13 D-TREMOR: Adv. Model Shaping In practice, we run into three common issues faced by concurrent optimization algorithms: – Slow convergence – Oscillation – Local optima We can alter our model-shaping to mitigate these by reasoning about the types of interactions we have D-TREMOR - AAMAS201113

14 D-TREMOR: Adv. Model Shaping Slow convergence  Prioritization – Assign priorities to agents, only model-shape collision interactions for higher priority agents – Can quickly resolve purely negative interactions Negative interaction: when every agent is guaranteed to have a lower-valued local policy if an interaction occurs D-TREMOR - AAMAS201114

15 D-TREMOR: Adv. Model Shaping Oscillation  Probabilistic shaping – Often caused by time dynamics between agents Agent 1 shapes based on Agent 2’s old policy Agent 2 shapes based on Agent 1’s old policy – Each agent only applies model-shaping with probability δ [Zhang 2005] – Breaks out of cycles between agent policies D-TREMOR - AAMAS201115

16 D-TREMOR: Adv. Model Shaping Local Optima  Optimistic initialization – Agents cannot detect mixed interactions (e.g. debris) Rescue agent policies can only improve if debris is cleared Cleaner agent policies can only worsen if they clear debris D-TREMOR - AAMAS2011 I’m not going near the debris If no one is going through debris, I won’t clear it I’m not clearing the debris 16

17 D-TREMOR: Adv. Model Shaping Local Optima  Optimistic initialization – Agents cannot detect mixed interactions (e.g. debris) Rescue agent policies can only improve if debris is cleared Cleaner agent policies can only worsen if they clear debris – Let each agent solve an initial model that uses an optimistic assumption of interaction condition D-TREMOR - AAMAS201117

18 Experimental Setup D-TREMOR policies – Max-joint-value – Last iteration Comparison policies – Independent – Optimistic – Do-nothing – Random Scaling: – 10 to 100 agents – Random maps Density – 100 agents – Concentric ring maps 3 problems/condition 20 planning iterations 7 time step horizon 1 CPU per agent D-TREMOR - AAMAS2011 D-TREMOR produces reasonable policies for 100-agent planning problems in under 6 hrs. (with some caveats) D-TREMOR produces reasonable policies for 100-agent planning problems in under 6 hrs. (with some caveats) 18

19 Experimental Datasets Scaling Dataset Density Dataset D-TREMOR - AAMAS201119

20 Experimental Results: Scaling D-TREMOR - AAMAS2011 Naïve Policies D-TREMOR Policies 20

21 Experimental Results: Density D-TREMOR - AAMAS2011 D-TREMOR rescues the most victims D-TREMOR does not resolve every collision 21

22 Experimental Results: Time D-TREMOR - AAMAS2011 Increase in time related to # of CLs, not # of agents 22 # of CLs Active

23 Conclusions D-TREMOR: Decentralized planning for sparse Dec-POMDPs with many agents Demonstrated complete distributability, fast heuristic interaction detection, and local message exchange to achieve high scalability Empirical results in simulated search and rescue domain D-TREMOR - AAMAS201123

24 Future Work Generalized framework for distributed planning under uncertainty through iterative message exchange Optimality/convergence bounds Reduce necessary communication Better search over task allocations Scaling to larger team sizes 24D-TREMOR - AAMAS2011

25 25

26 D-TREMOR - AAMAS201126

27 Motivation Scaling planning to large teams is hard – Need to plan (with uncertainty) for each agent in team – Agents must consider the actions of a growing number of teammates – Full, joint problem has NEXP complexity [Bernstein 2002] Optimality is going to be infeasible Find and exploit structure in the problem Make good plans in reasonable amount of time D-TREMOR - AAMAS201127

28 Motivation Exploit three characteristics of these domains 1. Explicit Interactions Specific combinations of states and actions where effects depend on more than one agent 2. Sparsity of Interactions Many potential interactions could occur between agents Only a few will occur in any given solution 3. Distributed Computation Each agent has access to local computation A centralized algorithm has access to 1 unit of computation A distributed algorithm has access to N units of computation D-TREMOR - AAMAS201128

29 Experimental Results: Density D-TREMOR - AAMAS2011 Do-nothing does the best? Ignoring interactions = poor performance 29

30 Experimental Results: Time D-TREMOR - AAMAS2011 Why is this increasing? 30

31 Related Work D-TREMOR - AAMAS2011 Scalability Generality Structured Dec-(PO)MDP planners – JESP [Nair 2003] – TD-Dec-POMDP [Witwicki 2010] – EDI-CR [Mostafa 2009] – SPIDER [Marecki 2009] Restrict generality slightly to get scalability High optimality 31

32 Related Work D-TREMOR - AAMAS2011 Scalability Generality Heuristic Dec-(PO)MDP planners – TREMOR [Varakantham 2009] – OC-Dec-MDP [Beynier 2005] Sacrifice optimality for scalability High generality 32

33 Related Work D-TREMOR - AAMAS2011 Scalability Generality Structured multiagent path planners – DPC [Bhattacharya 2010] – Optimal Decoupling [Van den Berg 2009] Sacrifice generality further to get scalability High optimality 33

34 Related Work D-TREMOR - AAMAS2011 Scalability Generality Heuristic multiagent path planners – Dynamic Networks [Clark 2003] – Prioritized Planning [Van den Berg 2005] Sacrifice optimality to get scalability 34

35 Scalability Generality Related Work D-TREMOR - AAMAS2011 Our approach: Fix high scalability and generality Explore what level of optimality is possible 35

36 A Simple Rescue Domain D-TREMOR - AAMAS2011 Rescue Agent Cleaner Agent Narrow Corridor Victim Unsafe Cell Clearable Debris Clearable Debris 36

37 A Simple (Large) Rescue Domain D-TREMOR - AAMAS201137

38 Distributed POMDP with Coordination Locales (DPCL) Often, interactions between agents are sparse D-TREMOR - AAMAS201138 Only fits one agent Passable if cleaned [Varakantham, et al 2009]

39 Distributed, Iterative Planning Inspiration: – TREMOR [Varankantham 2009] – JESP [Nair 2003] Reduce the full joint problem into a set of smaller, independent sub-problems Solve independent sub- problems with local algorithm Modify sub-problems to push locally optimal solutions towards high-quality joint solution D-TREMOR - AAMAS201139

40 Distributed Team REshaping of MOdels for Rapid execution (D-TREMOR) Reduce the full joint problem into a set of smaller, independent sub-problems (one for each agent) Solve independent sub-problems with existing state-of-the-art algorithms Modify sub-problems such that local optimum solution approaches high- quality joint solution D-TREMOR - AAMAS2011 Task Allocation Local Planning Interaction Exchange Model Shaping 40

41 Decentralized auction EVA POMDP solver Policy sub-sampling and Coordination Locale (CL) messages Policy sub-sampling and Coordination Locale (CL) messages Prioritized/randomized reward and transition shaping Prioritized/randomized reward and transition shaping D-TREMOR (extending [Varakantham, et al 2009]) D-TREMOR - AAMAS2011 Task Allocation Local Planning Interaction Exchange Model Shaping 41

42 D-TREMOR: Task Allocation Assign “tasks” using decentralized auction – Greedy, nearest allocation Create local, independent sub-problem: D-TREMOR - AAMAS201142

43 D-TREMOR: Local Planning Solve using off-the-shelf algorithm (EVA) Result: locally-optimal policies D-TREMOR - AAMAS201143

44 D-TREMOR: Interaction Exchange Finding Pr CLi Evaluate local policy Compute frequency of associated s i, a i D-TREMOR - AAMAS2011 [Kearns 2002] : Entered corridor in 95 of 100 runs: Pr CLi = 0.95 44

45 D-TREMOR: Interaction Exchange Finding Val CLi Sample local policy value with/without interactions – Test interactions independently Compute change in value if interaction occurred D-TREMOR - AAMAS2011 No collision Collision Val CLi = -7 [Kearns 2002] : 45

46 D-TREMOR: Interaction Exchange Send CL messages to teammates: Sparsity  Relatively small # of messages D-TREMOR - AAMAS201146

47 D-TREMOR: Model Shaping Shape local model rewards/transitions based on remote interactions D-TREMOR - AAMAS201147 Probability of interaction Interaction model functions Independent model functions 47

48 D-TREMOR: Local Planning (again) Re-solve shaped local models to get new policies Result: new locally-optimal policies  new interactions D-TREMOR - AAMAS201148

49 D-TREMOR: Adv. Model Shaping In practice, we run into three common issues faced by concurrent optimization algorithms: – Slow convergence – Oscillation – Local optima We can alter our model-shaping to mitigate these by reasoning about the types of interactions we have D-TREMOR - AAMAS201149

50 D-TREMOR: Adv. Model Shaping Slow convergence  Prioritization – Majority of interactions are collisions – Assign priorities to agents, only model-shape collision interactions for higher priority agents – From DPP: prioritization can quickly resolve collision interactions – Similar properties for any purely negative interaction Negative interaction: when every agent is guaranteed to have a lower-valued local policy if an interaction occurs D-TREMOR - AAMAS201150

51 D-TREMOR: Adv. Model Shaping Oscillation  Probabilistic shaping – Often caused by time dynamics between agents Agent 1 shapes based on Agent 2’s old policy Agent 2 shapes based on Agent 1’s old policy – Each agent only applies model-shaping with probability δ [Zhang 2005] – Breaks out of cycles between agent policies D-TREMOR - AAMAS201151

52 D-TREMOR: Adv. Model Shaping Local Optima  Optimistic initialization – Agents cannot detect mixed interactions (e.g. debris) Rescue agent policies can only improve if debris is cleared Cleaner agent policies can only worsen if they clear debris D-TREMOR - AAMAS2011 Pr CL = low, Val CL = low If (Val CL = low): optimal policy  do nothing Pr CL = low, Val CL = low 52

53 D-TREMOR: Adv. Model Shaping Local Optima  Optimistic initialization – Agents cannot detect mixed interactions (e.g. debris) Rescue agent policies can only improve if debris is cleared Cleaner agent policies can only worsen if they clear debris – Let each agent solve an initial model that uses an optimistic assumption of interaction condition D-TREMOR - AAMAS201153

54 Experimental Setup D-TREMOR policies – Max-joint-value – Last iteration Comparison policies – Independent – Optimistic – Do-nothing – Random Scaling: – 10 to 100 agents – Random maps Density – 100 agents – Concentric ring maps 3 problems/condition 20 planning iterations 7 time step horizon 1 CPU per agent D-TREMOR - AAMAS2011 D-TREMOR produces reasonable policies for 100-agent planning problems in under 6 hrs. (with some caveats) D-TREMOR produces reasonable policies for 100-agent planning problems in under 6 hrs. (with some caveats) 54

55 Experimental Datasets Scaling Dataset Density Dataset D-TREMOR - AAMAS201155

56 Experimental Results: Scaling D-TREMOR - AAMAS2011 Naïve Policies D-TREMOR Policies 56

57 Experimental Results: Density D-TREMOR - AAMAS2011 Do-nothing does the best? Ignoring interactions = poor performance 57

58 Experimental Results: Density D-TREMOR - AAMAS2011 D-TREMOR rescues the most victims D-TREMOR does not resolve every collision 58

59 Experimental Results: Time D-TREMOR - AAMAS2011 Why is this increasing? 59

60 Experimental Results: Time D-TREMOR - AAMAS2011 Increase in time related to # of CLs, not # of agents 60

61 Conclusions D-TREMOR: Decentralized planning for sparse Dec-POMDPs with many agents Demonstrated complete distributability, fast heuristic interaction detection, and local message exchange to achieve high scalability Empirical results in simulated search and rescue domain D-TREMOR - AAMAS201161

62 Conclusions D-TREMOR produces reasonable policies for 100-agent planning problems in under 6 hrs. – Partially-observable, uncertain world – Multiple types of interactions & agents Improves over independent planning Resolved interactions in large problems Still some convergence/efficiency issues D-TREMOR - AAMAS201162

63 DPCL vs. other models EDI/EDI-CR – Adds complex transition functions TD-Dec-MDP – Allows simultaneous interaction (within epoch) Factored MDP/POMDP – Adds interactions that span epochs D-TREMOR - AAMAS201163

64 D-TREMOR D-TREMOR - AAMAS201164

65 D-TREMOR D-TREMOR - AAMAS201165

66 D-TREMOR: Reward functions Probability that a debris will not allow a robot to enter the cell: – P_Debris = 0.9; Probability of action failure – P_ActionFailure = 0.2; Probability that success is observed if the action succeeded. – P_ObsSuccessOnSuccess = 0.8; Probability that success is observed if the action failed – P_ObsSuccessOnFailure = 0.2; Probability that a robot will return to the same cell after collision – P_ReboundAfterCollision = 0.5; Reward of saving a victim – R_Victim = 10.0; Reward of cleaning debris – R_Cleaning = 0.25; Reward of moving – R_Move = -0.5; Reward of observing – R_Observe = -0.25; Reward for a collision – R_Collision = -5.0; Reward for landing in an unsafe cell – R_Unsafe = -1; D-TREMOR - AAMAS201166

67 Review: POMDP D-TREMOR - AAMAS2011 : Set of States : Set of Actions : Set of Observations : Transition function : Reward function : Observation function 67

68 Distributed POMDP with Coordination Locales [Varakantham, et al 2009] Extension of Dec-POMDP which modifies, Coordination locales (CLs) represent interactions: D-TREMOR - AAMAS2011 Explicit time Explicit time constraint Implicitly construct interaction functions CL = 68

69 Proposed Approach: DIMS Distributed Iterative Model Shaping D-TREMOR - AAMAS2011 Task Allocation Local Planning Interaction Exchange Model Shaping Assign tasks to agents Reduce search space considered by agent Define local sub-problem for each robot 69

70 Proposed Approach: DIMS Distributed Iterative Model Shaping D-TREMOR - AAMAS2011 Task Allocation Local Planning Interaction Exchange Model Shaping Assign tasks to agents Reduce search space considered by agent Define local sub-problem for each robot Full SI-Dec-POMDP Local (Independent) POMDP 70

71 Proposed Approach: DIMS Distributed Iterative Model Shaping D-TREMOR - AAMAS2011 Task Allocation Local Planning Interaction Exchange Model Shaping Solve local sub-problems using off-the- shelf centralized solver Result: Locally-optimal policy 71

72 Proposed Approach: DIMS Distributed Iterative Model Shaping D-TREMOR - AAMAS2011 Task Allocation Local Planning Interaction Exchange Model Shaping Given local policy: estimate local probability and value of interactions Communicate local probability and value of relevant interactions to team members Sparsity  Relatively small # of messages 72

73 Proposed Approach: DIMS Distributed Iterative Model Shaping D-TREMOR - AAMAS2011 Task Allocation Local Planning Interaction Exchange Model Shaping Modify local sub-problems to account for presence of interactions 73

74 Proposed Approach: DIMS Distributed Iterative Model Shaping D-TREMOR - AAMAS2011 Task Allocation Local Planning Interaction Exchange Model Shaping Reallocate tasks or re-plan using modified local sub-problem 74

75 Any decentralized allocation mechanism (e.g. auctions) Stock graph, MDP, POMDP solver Lightweight local evaluation and low-bandwidth messaging Methods to alter local problem to incorporate non-local effects Proposed Approach: DIMS Distributed Iterative Model Shaping D-TREMOR - AAMAS2011 Task Allocation Local Planning Interaction Exchange Model Shaping 75

76 Example: Interactions D-TREMOR - AAMAS2011 Rescue robot Cleaner robot Debris Victim 76

77 Example: Sparsity D-TREMOR - AAMAS201177


Download ppt "Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents Prasanna Velagapudi Pradeep Varakantham Paul Scerri Katia Sycara."

Similar presentations


Ads by Google