Presentation on theme: "Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E"— Presentation transcript:
1Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind TambeLafayetteCollege
2Forward PointerWhen Should There be a “Me” in “Team”? Distributed Multi-Agent Optimization Under UncertaintyMatthew E. Taylor, Manish Jain, Yanquin Jin, Makoto Yooko, & Milind TambeWednesday, 8:30 – 10:30 Coordination and Cooperation 1
3Teamwork: Foundational MAS Concept Joint actions improve outcomeBut increases communication & computationOver two decades of workThis paper: increased teamwork can harm teamEven without considering communication & computationOnly considering team rewardMultiple algorithms, multiple settingsBut why?
4DCOPs: Distributed Constraint Optimization Problems Multiple domainsMeeting schedulingTraffic light coordinationRoboCup soccerMulti-agent plan coordinationSensor networksDistributedRobust to failureScalable(In)CompleteQuality bounds
6DCOP Framework a1 a2 a3 a1 a2 Reward 10 6 a2 a3 Reward 10 6 6a2a3Reward106a1a2a3TODO: not graph coloringK-opt: more detail (?): 1-opt up to centralized
7DCOP Framework a1 a2 a3 Different “levels” of teamwork possible Reward106a2a3Reward106a1a2a3TODO: not graph coloringK-opt: more detail (?): 1-opt up to centralizedDifferent “levels” of teamwork possibleComplete Solution is NP-Hard
8D-Cee: Distributed Coordination of Exploration and Exploitation Environment may be unknownMaximize on-line reward over some number of roundsExploration vs. ExploitationDemonstrated mobile ad-hoc networkSimulation [Released] & Robots [Released Soon]
10DCOP → DCEEDistributed Coordination of Exploration and Exploitation
11DCEE Algorithm: SE-Optimistic (Will build upon later) Rewards on [1,200]If I move, I’d get R=200a1a2a3a4995075
12DCEE Algorithm: SE-Optimistic (Will build upon later) Rewards on [1,200]If I move, I’d gain 275If I move, I’d gain 251If I move, I’d gain 101If I move, I’d gain 125a1a2a3a3a4995075Explore or Exploit?
13Balanced Exploration Techniques BE-RebidDecision theoretic calculation of explorationTrack previous best location Rb: can backtrackReason about exploring for some number of steps (te)TODO: explain 3 partsBalanced Exploration with BacktrackingAssume knowledge of the distribution.BE techniques use the current reward, time left and distribution information to estimate the utility of exploration.BE techniques are more complicated. They require more computation and are harder to implement.Agents can backtrack to a previously visited location.Compares between two actions: Backtrack or Explore.E.U.(explore) is sum of three terms:utility of exploringutility of finding a better reward than current Rbutility of failing to find a better reward than current RbAfter agents explore and then backtrack, they could not have reduced the overall reward.In SE methods, the agents evaluate in each time step and then proceed. Here we allow an agent to commit to take an action for more than 1 round.Reward while exploiting× P(improve reward)Reward while exploiting× P(NOT improve reward)Reward while exploring
14Success! [ATSN-09][IJCAI-09] Both classes of (incomplete) algorithmsSimulation and on RobotsAd hoc Wireless Network(Improvement if performance > 0)Third International Workshop on Agent Technology for Sensor Networks (at AAMAS-09)
15k-Optimality Increased coordination – originally DCOP formulation In DCOP, increased k = increased team rewardFind groups of agents to change variablesJoint actionsNeighbors of moving group cannot moveDefines amount of teamwork(Higher communication & computation overheads)
16“k-Optimality” in DCEE Groups of size k form, those with the most to gain move (change the value of their variable)A group can only move if no other agents in its neighborhood move
17Example: SE-Optimistic-2 Rewards on [1,200]If I move, I’d gain 275If I move, I’d gain 251If I move, I’d gain 101If I move, I’d gain 125a1a2a3a4995075200-99a1a499a2a250a3a375
20Confirms Team Uncertainty Penalty Averaged over 10 trials eachTrend confirmed!(Huge standard error)Total GainChainComplete!!?
21Problem with “k-Optimal” Unknown rewardscannot know if can increase reward by moving!Define new term: L-Movement# of agents that can change variables per roundIndependent of exploration algorithmGraph dependantAlternate measure of teamwork
22General DCOP Analysis Tool? L-MovementExample: k = 1 algorithmsL is the size of the largest maximal independent set of the graphNP-hard to calculate for a general graphharder for higher kConsider ring & complete graphs, both with 5 verticesring graph: maximal independent set is 2complete graph: maximal independent set is 1For k =1L=1 for a complete graphsize of the maximal independent set of a ring graph is:General DCOP Analysis Tool?
23Configuration Hypercube No (partial-)assignment is believed to be better than anotherwlog, agents can select next value when exploringDefine configuration hypercube: CEach agent is a dimensionis total reward when agent takes valuecannot be calculated without explorationvalues drawn from known reward distributionMoving along an axis in hypercube → agent changing valueExample: 3 agents (C is 3 dimensional)Changing from C[a, b, c] to C[a, b, c’]Agent A3 changes from c to c’
24How many agents can move? (1/2) In a ring graph with 5 nodesk = 1 : L = 2k = 2 : L = 3 In a complete graph with 5 nodesk = 1 : L = 1k = 2 : L = 2
25How many agents can move? (2/2) Configurationis reachable by an algorithm with movement L in s stepsif an only ifandHow many agents can move? (2/2)C[2,2] reachable for L=1 if s ≥ 4
26L-Movement Experiments For various DCEE problems, distributions, and L:For steps s = :Construct hypercube with s values per dimensionFind M, the max achievable reward in s steps, given LReturn average of 50 runsExample: 2D HypercubeOnly half reachable if L=1 All locations reachable if L=2
27Restricting to L-Movement: Complete Complete Graph k = 1 : L = 1k = 2 : L = 2L=1→2Average Maximum Reward Discovered
28Restricting to L-Movement: Ring Ring graphk = 1 : L = 2k = 2 : L = 3 Average Maximum Reward Discovered
29Uniform distribution of rewards RingCompleteUniform distribution of rewards4 agentsDifferent normal distribution
30k and L: 5-agent graphsK valueRing Graph, L valueComplete Graph, L value12345Increasing k changes L less in ring than completeConfiguration Hypercube is upper boundPosit a consistent negative effectSuggests why increasing k has different effects:Larger improvement in complete than ring for increasing k
31L-movement May Help Explain Team Uncertainty Penalty L = 2 will be able to explore more of C than algorithm with L = 1Independent of exploration algorithm!Determined by k and graph structureC is upper bound – posit constant negative effectAny algorithm experiences diminishing returns as k increasesConsistent with DCOP resultsL-movement difference between k = 1 algorithms and k = 2Larger difference in graphs with more agentsFor k = 1, L = 1 for a complete graphFor k = 1, L increases with the number of vertices in a ring graph
32Thank youTowards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe