1 University of Southern California Towards A Formalization Of Teamwork With Resource Constraints Praveen Paruchuri, Milind Tambe, Fernando Ordonez University.

Slides:



Advertisements
Similar presentations
Hadi Goudarzi and Massoud Pedram
Advertisements

Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Partially Observable Markov Decision Process (POMDP)
دانشگاه صنعتي اميركبير دانشكده مهندسي پزشكي Constraints in MPC کنترل پيش بين-دکتر توحيدخواه.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
Markov Models for Multi-Agent Coordination Maayan Roth Multi-Robot Reading Group April 13, 2005.
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Security via Strategic Randomization Milind Tambe Fernando Ordonez Praveen Paruchuri Sarit Kraus (Bar Ilan, Israel) Jonathan Pearce, Jansuz Marecki James.
Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod
Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.
主講人:虞台文 大同大學資工所 智慧型多媒體研究室
Infinite Horizon Problems
Planning under Uncertainty
Neeraj Jaggi ASSISTANT PROFESSOR DEPT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE WICHITA STATE UNIVERSITY 1 Rechargeable Sensor Activation under Temporally.
Generalizing Plans to New Environments in Multiagent Relational MDPs Carlos Guestrin Daphne Koller Stanford University.
KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.
Mechanism Design. Overview Incentives in teams (T. Groves (1973)) Algorithmic mechanism design (Nisan and Ronen (2000)) - Shortest Path - Task Scheduling.
Markov Decision Processes
1 University of Southern California Security in Multiagent Systems by Policy Randomization Praveen Paruchuri, Milind Tambe, Fernando Ordonez University.
Efficient Methodologies for Reliability Based Design Optimization
1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.
Statement of the Problem Goal Establishes Setting of the Problem hypothesis Additional information to comprehend fully the meaning of the problem scopedefinitionsassumptions.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Multirobot Coordination in USAR Katia Sycara The Robotics Institute
Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC421 – Fall 2003 material from Jean-Claude Latombe, and Daphne Koller.
1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested ( to me before class)  Can use your own.
OBJECT FOCUSED Q-LEARNING FOR AUTONOMOUS AGENTS M. ONUR CANCI.
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
Software Multiagent Systems: Lecture 13 Milind Tambe University of Southern California
CS584 - Software Multiagent Systems Lecture 12 Distributed constraint optimization II: Incomplete algorithms and recent theoretical results.
Shlomo Zilberstein Alan Carlin Bounded Rationality in Multiagent Systems using Decentralized Metareasoning TexPoint fonts used in EMF. Read the TexPoint.
Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.
By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering.
Using Reinforcement Learning to Model True Team Behavior in Uncertain Multiagent Settings in Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS.
Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.
Risk Analysis & Modelling
A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.
Optimal Resource Allocation for Protecting System Availability against Random Cyber Attack International Conference Computer Research and Development(ICCRD),
Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.
1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.
1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
1 Solving Infinite Horizon Stochastic Optimization Problems John R. Birge Northwestern University (joint work with Chris Donohue, Xiaodong Xu, and Gongyun.
1 Multiagent Teamwork: Analyzing the Optimality and Complexity of Key Theories and Models David V. Pynadath and Milind Tambe Information Sciences Institute.
Robust Estimation With Sampling and Approximate Pre-Aggregation Author: Christopher Jermaine Presented by: Bill Eberle.
Resource Allocation in Hospital Networks Based on Green Cognitive Radios 王冉茵
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Chance Constrained Robust Energy Efficiency in Cognitive Radio Networks with Channel Uncertainty Yongjun Xu and Xiaohui Zhao College of Communication Engineering,
1 Ch 12: More Advanced Linear Programming Concepts and Methods Applying Linear Programming to Those Investments in Which The Simplifying Assumptions of.
The minimum cost flow problem. Solving the minimum cost flow problem.
Formal Complexity Analysis of RoboFlag Drill & Communication and Computation in Distributed Negotiation Algorithms in Distributed Negotiation Algorithms.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
Keep the Adversary Guessing: Agent Security by Policy Randomization
The minimum cost flow problem
István Szita & András Lőrincz
Networked Distributed POMDPs: DCOP-Inspired Distributed POMDPs
The story of distributed constraint optimization in LA: Relaxed
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
CS 416 Artificial Intelligence
Reinforcement Learning Dealing with Partial Observability
Reinforcement Nisheeth 18th January 2019.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Presentation transcript:

1 University of Southern California Towards A Formalization Of Teamwork With Resource Constraints Praveen Paruchuri, Milind Tambe, Fernando Ordonez University of Southern California Sarit Kraus Bar-Ilan University,Israel University of Maryland, College Park December,2003

2 University of Southern California Motivation: Teamwork with Resource Constraints Agent teams: Agents maximize team rewards and also ensure limited resource consumption E.g., Limited communication bandwidth, limited battery power etc Example Domain: Sensor Net Agents - Limited replenishable energy Mars Rovers - Limited energy for each daily activity

3 University of Southern California Framework & Context Framework for agent teams with resource constraints in complex and dynamic environments Resource constraints soft, not “hard”  Okay for Sensor to exceed energy threshold when needed.  Okay for Mars rover to exceed allocated energy once in a while for a regular activity. MDP POMDP MTDP Single AgentMulti Agent CMDP??? With resource Constrains Context

4 University of Southern California Our Contributions Extended MTDP ( EMTP ) – A Distributed MDP framework EMTDP ≠ CMDP with many agents.  Policy Randomization in CMDP –Causes miscoordination in teams.  Algorithm for transforming conjoined EMTDP (initial formulation dealing with joint actions) into actual EMTDP (reasoning about individual actions). Proof of equivalence between different transformations. Solution algorithm for the actual EMTDP. Maximize Expected Team Reward Bound Expected Resource Consumption

5 University of Southern California E-MTDP: Formally Defined An E-MTDP (for a 2 agent case) is a tuple where,  S,A,P,R : As defined in MTDP.  C1 = [ ]: Vector of cost of resource k for joint action a in state i ( for agent 1).  T1 = [ ]: Threshold on expected resource k consumption.  N = [ ]: Vector of joint communication costs for joint action a in state i.  Q : Threshold on communication costs Simplifying assumptions:  Individual observability (no POMDPs)  Two agents

6 University of Southern California Conjoined EMTDP – Simple example Two agent case S1 S7 S3 S2 S4 S5 S6 a1b2=.9 a2b1=.3 a2b1=.7 a1b2=.7 a1b1=1 a1b1=.3 a2b1=.7 a1b2=.9 a1b2=.1 R(S1,a2b2)=9 C1(S1,a2b2)=7 C2(S1,a2b2)=7 a2b2=1

7 University of Southern California Linear Program : Solving Conjoined EMTDP LP for solving MDP Maximizing Reward Handling constraints Expected cost of resource k over all states and actions less than t1

8 University of Southern California Sample LP solution VISITED( X11) a1b1 to be executed 0% time VISITED( X12) a1b2 : 36% = 9/25 VISITED( X13) a2b1 : 64% = 16/25 VISITED( X14) a2b2 : 0% B1(16/25)B2(9/25) a1( 9/25)144/625 =.23 81/625 =.13 a2(16/25)256/625 =.4 144/625 =.23 Should have been 0. (Miscoordination)

9 University of Southern California Conjoined to Actual EMTDP: Transformation S1 A1c For each state, for each joint action, Introduce transition between original and new states Introduce transitions between new states and original target states Introduce a communication and non-communication action for each different individual action and add corresponding new states

10 University of Southern California Non-linear Constraints Need to introduce non-linear constraints For each original state  For each new state introduced by no communication action –Set conditional probability of corresponding actions equal Ex: P(b1/ ) = P(b1/ )=……=P(b1/ ) && …….. && P(bn/ ) = P(bn/ )=……=P(bn/ ).,,, - Observable, Reached by Comm action,,, - Unobservable, No Comm action

11 University of Southern California Reason for non-linear constraints Agent B has no hint of state if NC actions.  Necessity to make its actions independent of source state.  Probability of action b1 from state should equal probability of same action (i.e b1) from. Miscoordination avoided  Actions independent of state. Transformation example -

12 University of Southern California Experimental Results Fig 1 S1 - 0 S9 - 8

13 University of Southern California Experiments : Example domain 2 Domain 1: Comparing Expected rewards – Comm Threshold Conjoined DeterministicMiscoordinationEMTDP No reward No No10.55 ( Miscoordination resulted in violating resource constraints ) Domain 2 - A team of two rovers and several scientists using them Each scientist has a daily routine of observations Rover can use a limited amount of energy in serving a scientist Experiment conducted: Observe Martian rocks Rovers Maximize observation output within the energy budget provided. Soft constraint – Exceeding energy budget on a day is not catastrophic Overutilizing frequently affects other scientist’s work Uncertainty – Only.75 chance of succeeding in an observation EMTDP had about 180 states, 1500 variables and 40 non-linear constraints. Could handle problem of this order in below 20 secs.

14 University of Southern California Summary and Future Work Novel formalization of teamwork with resource constraints Maximize expected team reward but bound expected resource consumption. Provided a EMTDP formulation where agents avoid miscoordination even though randomized policies. Proved equivalence of different EMTDP transformation strategies ( see paper for details ) Introduction of non-linear constraints. Future Work -  Need to fix on complexity.  Experiment on n-agent case.  Extend work to partially observable domains.

15 University of Southern California Thank You Any Questions ???