Download presentation

Presentation is loading. Please wait.

Published byMikel Merrett Modified over 2 years ago

1
**Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E**

Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe Lafayette College

2
Forward Pointer When Should There be a “Me” in “Team”? Distributed Multi-Agent Optimization Under Uncertainty Matthew E. Taylor, Manish Jain, Yanquin Jin, Makoto Yooko, & Milind Tambe Wednesday, 8:30 – 10:30 Coordination and Cooperation 1

3
**Teamwork: Foundational MAS Concept**

Joint actions improve outcome But increases communication & computation Over two decades of work This paper: increased teamwork can harm team Even without considering communication & computation Only considering team reward Multiple algorithms, multiple settings But why?

4
**DCOPs: Distributed Constraint Optimization Problems**

Multiple domains Meeting scheduling Traffic light coordination RoboCup soccer Multi-agent plan coordination Sensor networks Distributed Robust to failure Scalable (In)Complete Quality bounds

5
DCOP Framework a1 a2 Reward 10 6 a2 a3 Reward 10 6 a1 a2 a3

6
**DCOP Framework a1 a2 a3 a1 a2 Reward 10 6 a2 a3 Reward 10 6**

6 a2 a3 Reward 10 6 a1 a2 a3 TODO: not graph coloring K-opt: more detail (?): 1-opt up to centralized

7
**DCOP Framework a1 a2 a3 Different “levels” of teamwork possible**

Reward 10 6 a2 a3 Reward 10 6 a1 a2 a3 TODO: not graph coloring K-opt: more detail (?): 1-opt up to centralized Different “levels” of teamwork possible Complete Solution is NP-Hard

8
**D-Cee: Distributed Coordination of Exploration and Exploitation**

Environment may be unknown Maximize on-line reward over some number of rounds Exploration vs. Exploitation Demonstrated mobile ad-hoc network Simulation [Released] & Robots [Released Soon]

9
DCOP Distrubted Constraint Optimization Problem

10
DCOP → DCEE Distributed Coordination of Exploration and Exploitation

11
**DCEE Algorithm: SE-Optimistic (Will build upon later)**

Rewards on [1,200] If I move, I’d get R=200 a1 a2 a3 a4 99 50 75

12
**DCEE Algorithm: SE-Optimistic (Will build upon later)**

Rewards on [1,200] If I move, I’d gain 275 If I move, I’d gain 251 If I move, I’d gain 101 If I move, I’d gain 125 a1 a2 a3 a3 a4 99 50 75 Explore or Exploit?

13
**Balanced Exploration Techniques**

BE-Rebid Decision theoretic calculation of exploration Track previous best location Rb: can backtrack Reason about exploring for some number of steps (te) TODO: explain 3 parts Balanced Exploration with Backtracking Assume knowledge of the distribution. BE techniques use the current reward, time left and distribution information to estimate the utility of exploration. BE techniques are more complicated. They require more computation and are harder to implement. Agents can backtrack to a previously visited location. Compares between two actions: Backtrack or Explore. E.U.(explore) is sum of three terms: utility of exploring utility of finding a better reward than current Rb utility of failing to find a better reward than current Rb After agents explore and then backtrack, they could not have reduced the overall reward. In SE methods, the agents evaluate in each time step and then proceed. Here we allow an agent to commit to take an action for more than 1 round. Reward while exploiting × P(improve reward) Reward while exploiting × P(NOT improve reward) Reward while exploring

14
**Success! [ATSN-09][IJCAI-09]**

Both classes of (incomplete) algorithms Simulation and on Robots Ad hoc Wireless Network (Improvement if performance > 0) Third International Workshop on Agent Technology for Sensor Networks (at AAMAS-09)

15
**k-Optimality Increased coordination – originally DCOP formulation**

In DCOP, increased k = increased team reward Find groups of agents to change variables Joint actions Neighbors of moving group cannot move Defines amount of teamwork (Higher communication & computation overheads)

16
**“k-Optimality” in DCEE**

Groups of size k form, those with the most to gain move (change the value of their variable) A group can only move if no other agents in its neighborhood move

17
**Example: SE-Optimistic-2**

Rewards on [1,200] If I move, I’d gain 275 If I move, I’d gain 251 If I move, I’d gain 101 If I move, I’d gain 125 a1 a2 a3 a4 99 50 75 200-99 a1 a4 99 a2 a2 50 a3 a3 75

18
**Sample coordination results**

Omniscient: confirms DCOP result, as expected ! ! ? Artificially Supplied Rewards (DCOP) Complete Graph Chain Graph

19
**Physical Implementation**

Create Robots Mobile ad-hoc Wireless Network

20
**Confirms Team Uncertainty Penalty**

Averaged over 10 trials each Trend confirmed! (Huge standard error) Total Gain Chain Complete ! ! ?

21
**Problem with “k-Optimal”**

Unknown rewards cannot know if can increase reward by moving! Define new term: L-Movement # of agents that can change variables per round Independent of exploration algorithm Graph dependant Alternate measure of teamwork

22
**General DCOP Analysis Tool?**

L-Movement Example: k = 1 algorithms L is the size of the largest maximal independent set of the graph NP-hard to calculate for a general graph harder for higher k Consider ring & complete graphs, both with 5 vertices ring graph: maximal independent set is 2 complete graph: maximal independent set is 1 For k =1 L=1 for a complete graph size of the maximal independent set of a ring graph is: General DCOP Analysis Tool?

23
**Configuration Hypercube**

No (partial-)assignment is believed to be better than another wlog, agents can select next value when exploring Define configuration hypercube: C Each agent is a dimension is total reward when agent takes value cannot be calculated without exploration values drawn from known reward distribution Moving along an axis in hypercube → agent changing value Example: 3 agents (C is 3 dimensional) Changing from C[a, b, c] to C[a, b, c’] Agent A3 changes from c to c’

24
**How many agents can move? (1/2) **

In a ring graph with 5 nodes k = 1 : L = 2 k = 2 : L = 3 In a complete graph with 5 nodes k = 1 : L = 1 k = 2 : L = 2

25
**How many agents can move? (2/2)**

Configuration is reachable by an algorithm with movement L in s steps if an only if and How many agents can move? (2/2) C[2,2] reachable for L=1 if s ≥ 4

26
**L-Movement Experiments**

For various DCEE problems, distributions, and L: For steps s = : Construct hypercube with s values per dimension Find M, the max achievable reward in s steps, given L Return average of 50 runs Example: 2D Hypercube Only half reachable if L=1 All locations reachable if L=2

27
**Restricting to L-Movement: Complete**

Complete Graph k = 1 : L = 1 k = 2 : L = 2 L=1→2 Average Maximum Reward Discovered

28
**Restricting to L-Movement: Ring**

Ring graph k = 1 : L = 2 k = 2 : L = 3 Average Maximum Reward Discovered

29
**Uniform distribution of rewards**

Ring Complete Uniform distribution of rewards 4 agents Different normal distribution

30
k and L: 5-agent graphs K value Ring Graph, L value Complete Graph, L value 1 2 3 4 5 Increasing k changes L less in ring than complete Configuration Hypercube is upper bound Posit a consistent negative effect Suggests why increasing k has different effects: Larger improvement in complete than ring for increasing k

31
**L-movement May Help Explain Team Uncertainty Penalty**

L = 2 will be able to explore more of C than algorithm with L = 1 Independent of exploration algorithm! Determined by k and graph structure C is upper bound – posit constant negative effect Any algorithm experiences diminishing returns as k increases Consistent with DCOP results L-movement difference between k = 1 algorithms and k = 2 Larger difference in graphs with more agents For k = 1, L = 1 for a complete graph For k = 1, L increases with the number of vertices in a ring graph

32
Thank you Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe

Similar presentations

OK

Using MDP Characteristics to Guide Exploration in Reinforcement Learning Paper: Bohdana Ratich & Doina Precucp Presenter: Michael Simon Some pictures/formulas.

Using MDP Characteristics to Guide Exploration in Reinforcement Learning Paper: Bohdana Ratich & Doina Precucp Presenter: Michael Simon Some pictures/formulas.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on networking related topics in ict Liquid crystal on silicon display ppt on tv Eat before dentist appt on your birthday Quiz ppt on india Ppt on natural and artificial satellites of india Ppt on 2 dimensional figures and 3 dimensional slides shoes Ppt on general knowledge quiz Ppt on search engines Ppt on construction company profile Ppt on statistics in maths what does the range