Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E

Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E
Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe Lafayette College

Forward Pointer When Should There be a “Me” in “Team”? Distributed Multi-Agent Optimization Under Uncertainty Matthew E. Taylor, Manish Jain, Yanquin Jin, Makoto Yooko, & Milind Tambe Wednesday, 8:30 – 10:30 Coordination and Cooperation 1

Teamwork: Foundational MAS Concept
Joint actions improve outcome But increases communication & computation Over two decades of work This paper: increased teamwork can harm team Even without considering communication & computation Only considering team reward Multiple algorithms, multiple settings But why?

DCOPs: Distributed Constraint Optimization Problems
Multiple domains Meeting scheduling Traffic light coordination RoboCup soccer Multi-agent plan coordination Sensor networks Distributed Robust to failure Scalable (In)Complete Quality bounds

DCOP Framework a1 a2 Reward 10 6 a2 a3 Reward 10 6 a1 a2 a3

DCOP Framework a1 a2 a3 a1 a2 Reward 10 6 a2 a3 Reward 10 6
6 a2 a3 Reward 10 6 a1 a2 a3 TODO: not graph coloring K-opt: more detail (?): 1-opt up to centralized

DCOP Framework a1 a2 a3 Different “levels” of teamwork possible
Reward 10 6 a2 a3 Reward 10 6 a1 a2 a3 TODO: not graph coloring K-opt: more detail (?): 1-opt up to centralized Different “levels” of teamwork possible Complete Solution is NP-Hard

D-Cee: Distributed Coordination of Exploration and Exploitation
Environment may be unknown Maximize on-line reward over some number of rounds Exploration vs. Exploitation Demonstrated mobile ad-hoc network Simulation [Released] & Robots [Released Soon]

DCOP Distrubted Constraint Optimization Problem

DCOP → DCEE Distributed Coordination of Exploration and Exploitation

DCEE Algorithm: SE-Optimistic (Will build upon later)
Rewards on [1,200] If I move, I’d get R=200 a1 a2 a3 a4 99 50 75

DCEE Algorithm: SE-Optimistic (Will build upon later)
Rewards on [1,200] If I move, I’d gain 275 If I move, I’d gain 251 If I move, I’d gain 101 If I move, I’d gain 125 a1 a2 a3 a3 a4 99 50 75 Explore or Exploit?

Balanced Exploration Techniques
BE-Rebid Decision theoretic calculation of exploration Track previous best location Rb: can backtrack Reason about exploring for some number of steps (te) TODO: explain 3 parts Balanced Exploration with Backtracking Assume knowledge of the distribution. BE techniques use the current reward, time left and distribution information to estimate the utility of exploration. BE techniques are more complicated. They require more computation and are harder to implement. Agents can backtrack to a previously visited location. Compares between two actions: Backtrack or Explore. E.U.(explore) is sum of three terms: utility of exploring utility of finding a better reward than current Rb utility of failing to find a better reward than current Rb After agents explore and then backtrack, they could not have reduced the overall reward. In SE methods, the agents evaluate in each time step and then proceed. Here we allow an agent to commit to take an action for more than 1 round. Reward while exploiting × P(improve reward) Reward while exploiting × P(NOT improve reward) Reward while exploring

Success! [ATSN-09][IJCAI-09]
Both classes of (incomplete) algorithms Simulation and on Robots Ad hoc Wireless Network (Improvement if performance > 0) Third International Workshop on Agent Technology for Sensor Networks (at AAMAS-09)

k-Optimality Increased coordination – originally DCOP formulation
In DCOP, increased k = increased team reward Find groups of agents to change variables Joint actions Neighbors of moving group cannot move Defines amount of teamwork (Higher communication & computation overheads)

“k-Optimality” in DCEE
Groups of size k form, those with the most to gain move (change the value of their variable) A group can only move if no other agents in its neighborhood move

Example: SE-Optimistic-2
Rewards on [1,200] If I move, I’d gain 275 If I move, I’d gain 251 If I move, I’d gain 101 If I move, I’d gain 125 a1 a2 a3 a4 99 50 75 200-99     a1 a4 99 a2 a2 50 a3 a3 75

Sample coordination results
Omniscient: confirms DCOP result, as expected ! ! ? Artificially Supplied Rewards (DCOP) Complete Graph Chain Graph

Physical Implementation
Create Robots Mobile ad-hoc Wireless Network

Confirms Team Uncertainty Penalty
Averaged over 10 trials each Trend confirmed! (Huge standard error) Total Gain Chain Complete ! ! ?

Problem with “k-Optimal”
Unknown rewards cannot know if can increase reward by moving! Define new term: L-Movement # of agents that can change variables per round Independent of exploration algorithm Graph dependant Alternate measure of teamwork

General DCOP Analysis Tool?
L-Movement Example: k = 1 algorithms L is the size of the largest maximal independent set of the graph NP-hard to calculate for a general graph harder for higher k Consider ring & complete graphs, both with 5 vertices ring graph: maximal independent set is 2 complete graph: maximal independent set is 1 For k =1 L=1 for a complete graph size of the maximal independent set of a ring graph is: General DCOP Analysis Tool?

Configuration Hypercube
No (partial-)assignment is believed to be better than another wlog, agents can select next value when exploring Define configuration hypercube: C Each agent is a dimension is total reward when agent takes value cannot be calculated without exploration values drawn from known reward distribution Moving along an axis in hypercube → agent changing value Example: 3 agents (C is 3 dimensional) Changing from C[a, b, c] to C[a, b, c’] Agent A3 changes from c to c’

How many agents can move? (1/2)
In a ring graph with 5 nodes k = 1 : L = 2 k = 2 : L = 3 In a complete graph with 5 nodes k = 1 : L = 1 k = 2 : L = 2

How many agents can move? (2/2)
Configuration is reachable by an algorithm with movement L in s steps if an only if and How many agents can move? (2/2) C[2,2] reachable for L=1 if s ≥ 4

L-Movement Experiments
For various DCEE problems, distributions, and L: For steps s = : Construct hypercube with s values per dimension Find M, the max achievable reward in s steps, given L Return average of 50 runs Example: 2D Hypercube Only half reachable if L=1 All locations reachable if L=2

Restricting to L-Movement: Complete
Complete Graph k = 1 : L = 1 k = 2 : L = 2 L=1→2 Average Maximum Reward Discovered

Restricting to L-Movement: Ring
Ring graph k = 1 : L = 2 k = 2 : L = 3 Average Maximum Reward Discovered

Uniform distribution of rewards
Ring Complete Uniform distribution of rewards 4 agents Different normal distribution

k and L: 5-agent graphs K value Ring Graph, L value Complete Graph, L value 1 2 3 4 5 Increasing k changes L less in ring than complete Configuration Hypercube is upper bound Posit a consistent negative effect Suggests why increasing k has different effects: Larger improvement in complete than ring for increasing k

L-movement May Help Explain Team Uncertainty Penalty
L = 2 will be able to explore more of C than algorithm with L = 1 Independent of exploration algorithm! Determined by k and graph structure C is upper bound – posit constant negative effect Any algorithm experiences diminishing returns as k increases Consistent with DCOP results L-movement difference between k = 1 algorithms and k = 2 Larger difference in graphs with more agents For k = 1, L = 1 for a complete graph For k = 1, L increases with the number of vertices in a ring graph

Thank you Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe

Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E

Similar presentations

Presentation on theme: "Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E

Similar presentations

Presentation on theme: "Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E"— Presentation transcript:

Similar presentations

About project

Feedback