Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Similar presentations


Presentation on theme: "Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe."— Presentation transcript:

1 Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

2 Motivating Domains Disaster Rescue Sensor Networks  Characteristics of Domains: Uncertainty Coordinating multiple agents Sequential decision making

3 Meeting the challenges Problem: Multiple agents coordinating to perform multiple tasks in presence of uncertainty Sol: Represent as Distributed POMDPs and solve NEXP Complete for optimal solution Approximate algorithm to dynamically exploit structure in interactions Result: Vast improvement in performance over existing algorithms

4 Outline Illustrative Domain Model Approach: Exploit dynamic structure in interactions Results

5 Illustrative Domain  Multiple types of robots  Uncertainty in movements  Reward Saving victims Collisions Clearing debris  Maximize expected joint reward

6 Model DisPOMDPs with Coordination Locales, DPCL Joint model: Global state represents completion of tasks Agents independent except in coordination locales, CLs Two types of CLs: Same time CL (Ex: Agents colliding with each other) Future time CL (Ex: Cleaner robot cleaning the debris assists rescue robot in reaching the goal) Individual observability

7 Solving DPCLs with TREMOR Teams REshaping of MOdels for Rapid execution Two steps: 1. Branch and Bound search MDP based heuristics 2. Task Assignment evaluation By computing policies for every agent Perform only joint policy computation at CLs

8 1. Branch and Bound search

9 2. Task Assignment Evaluation  Until convergence of policies or maximum iterations: 1) Solve individual POMDPs 2) Identify potential coordination locales 3) Based on type and value of coordination :  Shape P and R of relevant individual agents  Capture interactions  Encourage/Discourage interactions 4) Go to step 1

10 Identifying potential CLs CL = Probability of CL occurring at a time step, T Given starting belief Standard belief update given policy Policy over belief states Probability of observing w, in belief state “b” Updating “b”

11 Type of CL STCL, if there exists “s” and “a” for which Transition/Reward function not decomposable, P(s,a,s’) ≠ Π 1≤i≤N P((s g,s i ),a i,(s g ’,s i ’)) OR R(s,a,s’) ≠ Σ 1≤i≤N R((s g,s i ),a i,(s g ’,s i ’)) FTCL, Completion of task (global state) by an agent at t’ affects transitions/rewards of other agents at t

12 Shaping Model (STCL) Shaping transition function Shaping reward function Joint transition probability when CL occurs New transition probability for agent “i”

13 Results Benchmark Algorithms Independent POMDPs Memory Bounded Dynamic Programming (MBDP) Criterion Decision quality Run-time Parameters: (i) agents; (ii) CLs; (iii) states; (iv) horizon

14 State space

15 Agents

16 Coordination Locales

17 Time Horizon

18 Related work Existing Research DEC-MDPs Assuming individual or collective full observability Task allocation and dependencies as input DEC-POMDPs JESP MBDP Exploiting independence in transition/reward/observation. Model Shaping Guestrin and Gordon, 2002

19 Conclusion DPCL, a specialization of Distributed POMDPs TREMOR exploits presence of few CLs in domains TREMOR depends on single agent POMDP solvers Results: TREMOR outperformed DisPOMDP algorithms, except in tightly coupled small problems

20 Questions?

21 Same Time CL (STCL) There is an STCL, if Transition function not decomposable, OR P(s,a,s’) ≠ Π 1≤i≤N P((s g,s i ),a i,(s g ’,s i ’)) Observation function not decomposable, OR O(s’,a,o) ≠ Π 1≤i≤N O(o i,a i,(s g ’,s i ’)) Reward function not decomposable R(s,a,s’) ≠ Σ 1≤i≤N R((s g,s i ),a i,(s g ’,s i ’)) Ex: Two robots colliding in a narrow corridor

22 Future Time CL Actions of one agent at “ t’ ” can affect transitions OR observations OR rewards of other agents at “ t ” P((s t g,s t i ),a t i,(s t g ’,s t i ’)|a j t’ ) ≠ P((s t g,s t i ),a t i,(s t g ’,s t i ’)), ¥ t’ < t R((s t g,s t i ),a t i,(s t g ’,s t i ’)|a j t’ ) ≠ R((s t g,s t i ),a t i,(s t g ’,s t i ’)), ¥ t’ < t O(w t i,a t i,(s t g ’,s t i ’)|a j t’ ) ≠ O(w t i,a t i,(s t g ’,s t i ’)), ¥ t’ < t Ex: Clearing of debris assists rescue robots in getting to victims faster


Download ppt "Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe."

Similar presentations


Ads by Google