Download presentation

Presentation is loading. Please wait.

Published byMikel Merrett Modified about 1 year ago

1
Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe Lafayette College

2
Forward Pointer When Should There be a “Me” in “Team”? Distributed Multi-Agent Optimization Under Uncertainty Matthew E. Taylor, Manish Jain, Yanquin Jin, Makoto Yooko, & Milind Tambe Wednesday, 8:30 – 10:30 Coordination and Cooperation 1 2

3
Teamwork: Foundational MAS Concept Joint actions improve outcome But increases communication & computation Over two decades of work This paper: increased teamwork can harm team –Even without considering communication & computation –Only considering team reward –Multiple algorithms, multiple settings –But why? 3

4
DCOPs: Distributed Constraint Optimization Problems Multiple domains –Meeting scheduling –Traffic light coordination –RoboCup soccer –Multi-agent plan coordination –Sensor networks Distributed –Robust to failure –Scalable (In)Complete –Quality bounds

5
a2a3Reward a1a2Reward DCOP Framework a1 a2 a3 5

6
a2a3Reward a1a2Reward DCOP Framework a1 a2 a3 6

7
a2a3Reward a1a2Reward DCOP Framework a1 a2 a3 Different “levels” of teamwork possible Complete Solution is NP-Hard 7

8
D-C EE : Distributed Coordination of Exploration and Exploitation Environment may be unknown Maximize on-line reward over some number of rounds –Exploration vs. Exploitation Demonstrated mobile ad-hoc network –Simulation [Released] & Robots [Released Soon]

9
DCOP Distrubted Constraint Optimization Problem 9

10
DCOP → DCEE Distributed Coordination of Exploration and Exploitation 10

11
DCEE Algorithm: SE-Optimistic (Will build upon later) a1 a2 a3 a4 Rewards on [1,200] If I move, I’d get R=200 11

12
a3 DCEE Algorithm: SE-Optimistic (Will build upon later) a1 a2 a4 Rewards on [1,200] If I move, I’d gain 101 If I move, I’d gain 251 If I move, I’d gain 275 If I move, I’d gain 125 Explore or Exploit? 12

13
Balanced Exploration Techniques BE-Rebid –Decision theoretic calculation of exploration –Track previous best location R b : can backtrack –Reason about exploring for some number of steps (t e ) Reward while exploring Reward while exploiting × P(improve reward) Reward while exploiting × P(NOT improve reward)

14
Success! [ATSN-09][IJCAI-09] Both classes of (incomplete) algorithms Simulation and on Robots –Ad hoc Wireless Network (Improvement if performance > 0)

15
k-Optimality Increased coordination – originally DCOP formulation –In DCOP, increased k = increased team reward Find groups of agents to change variables –Joint actions –Neighbors of moving group cannot move Defines amount of teamwork (Higher communication & computation overheads) 15

16
“k-Optimality” in DCEE k=1, 2,... o Groups of size k form, those with the most to gain move (change the value of their variable) o A group can only move if no other agents in its neighborhood move 16

17
a3 a2 a3 Example: SE-Optimistic-2 a1 a2 a4 Rewards on [1,200] If I move, I’d gain 101 If I move, I’d gain 251 If I move, I’d gain 275 If I move, I’d gain 125 a1 a

18
Sample coordination results Complete GraphChain Graph Artificially Supplied Rewards (DCOP) Omniscient: confirms DCOP result, as expected

19
Physical Implementation Create Robots Mobile ad-hoc Wireless Network

20
Confirms Team Uncertainty Penalty Averaged over 10 trials each Trend confirmed! (Huge standard error) Total Gain ChainComplete 20

21
Problem with “k-Optimal” Unknown rewards –cannot know if can increase reward by moving! Define new term: L-Movement –# of agents that can change variables per round –Independent of exploration algorithm –Graph dependant –Alternate measure of teamwork 21

22
L-Movement Example: k = 1 algorithms –L is the size of the largest maximal independent set of the graph –NP-hard to calculate for a general graph –harder for higher k Consider ring & complete graphs, both with 5 vertices –ring graph: maximal independent set is 2 –complete graph: maximal independent set is 1 For k =1 –L=1 for a complete graph –size of the maximal independent set of a ring graph is:

23
Configuration Hypercube No (partial-)assignment is believed to be better than another wlog, agents can select next value when exploring Define configuration hypercube: C Each agent is a dimension is total reward when agent takes value cannot be calculated without exploration values drawn from known reward distribution Moving along an axis in hypercube → agent changing value Example: 3 agents (C is 3 dimensional) Changing from C[a, b, c] to C[a, b, c’] Agent A 3 changes from c to c’ 23

24
How many agents can move? (1/2) In a ring graph with 5 nodes o k = 1 : L = 2 o k = 2 : L = 3 In a complete graph with 5 nodes o k = 1 : L = 1 o k = 2 : L = 2 24

25
Configuration is reachable by an algorithm with movement L in s steps if an only if and How many agents can move? (2/2) C[2,2] reachable for L=1 if s ≥ 4 25

26
L-Movement Experiments For various DCEE problems, distributions, and L: For steps s = : 1.Construct hypercube with s values per dimension 2.Find M, the max achievable reward in s steps, given L 3.Return average of 50 runs Example: 2D Hypercube o Only half reachable if L=1 o All locations reachable if L=2 s s

27
Restricting to L-Movement: Complete L=1→2 Complete Graph o k = 1 : L = 1 o k = 2 : L = 2 27 Average Maximum Reward Discovered

28
Restricting to L-Movement: Ring L=2→3 Ring graph o k = 1 : L = 2 o k = 2 : L = 3 28 Average Maximum Reward Discovered

29
1.Uniform distribution of rewards 2.4 agents 3.Different normal distribution Complete Ring 29

30
k and L: 5-agent graphs K valueRing Graph, L valueComplete Graph, L value Increasing k changes L less in ring than complete Configuration Hypercube is upper bound Posit a consistent negative effect Suggests why increasing k has different effects: Larger improvement in complete than ring for increasing k 30

31
L-movement May Help Explain Team Uncertainty Penalty L = 2 will be able to explore more of C than algorithm with L = 1 –Independent of exploration algorithm! –Determined by k and graph structure –C is upper bound – posit constant negative effect Any algorithm experiences diminishing returns as k increases –Consistent with DCOP results L-movement difference between k = 1 algorithms and k = 2 –Larger difference in graphs with more agents –For k = 1, L = 1 for a complete graph –For k = 1, L increases with the number of vertices in a ring graph 31

32
Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe

33
33

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google