Influence Diagrams for Robust Decision Making in Multiagent Settings.

Slides:



Advertisements
Similar presentations
BEHAVIORAL RESEARCH IN MANAGERIAL ACCOUNTING RANJANI KRISHNAN HARVARD BUSINESS SCHOOL & MICHIGAN STATE UNIVERSITY 2008.
Advertisements

M9302 Mathematical Models in Economics Instructor: Georgi Burlakov 3.1.Dynamic Games of Complete but Imperfect Information Lecture
Markov Decision Process
Distributed Markov Chains P S Thiagarajan School of Computing, National University of Singapore Joint work with Madhavan Mukund, Sumit K Jha and Ratul.
Continuation Methods for Structured Games Ben Blum Christian Shelton Daphne Koller Stanford University.
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Partially Observable Markov Decision Process (POMDP)
This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
Game Theoretical Insights in Strategic Patrolling: Model and Analysis Nicola Gatti – DEI, Politecnico di Milano, Piazza Leonardo.
Chapter 14 Infinite Horizon 1.Markov Games 2.Markov Solutions 3.Infinite Horizon Repeated Games 4.Trigger Strategy Solutions 5.Investing in Strategic Capital.
© 2015 McGraw-Hill Education. All rights reserved. Chapter 15 Game Theory.
Compressing Mental Model Spaces and Modeling Human Strategic Intent.
EKONOMSKA ANALIZA PRAVA. Game Theory Outline of the lecture: I. What is game theory? II. Elements of a game III. Normal (matrix) and Extensive (tree)
Decision Theoretic Planning
Short introduction to game theory 1. 2  Decision Theory = Probability theory + Utility Theory (deals with chance) (deals with outcomes)  Fundamental.
1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National.
Extensive-form games. Extensive-form games with perfect information Player 1 Player 2 Player 1 2, 45, 33, 2 1, 00, 5 Players do not move simultaneously.
A camper awakens to the growl of a hungry bear and sees his friend putting on a pair of running shoes, “You can’t outrun a bear,” scoffs the camper. His.
Planning under Uncertainty
Temporal Action-Graph Games: A New Representation for Dynamic Games Albert Xin Jiang University of British Columbia Kevin Leyton-Brown University of British.
BEE3049 Behaviour, Decisions and Markets Miguel A. Fonseca.
Games as Systems Administrative Stuff Exercise today Meet at Erik Stemme
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider.
Extensive Game with Imperfect Information Part I: Strategy and Nash equilibrium.
EC941 - Game Theory Francesco Squintani Lecture 3 1.
Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC421 – Fall 2003 material from Jean-Claude Latombe, and Daphne Koller.
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.
* Problem solving: active efforts to discover what must be done to achieve a goal that is not readily attainable.
MAKING COMPLEX DEClSlONS
Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.
Chapter 9 Games with Imperfect Information Bayesian Games.
History-Dependent Graphical Multiagent Models Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University of Michigan, USA.
Generalized and Bounded Policy Iteration for Finitely Nested Interactive POMDPs: Scaling Up Ekhlas Sonu, Prashant Doshi Dept. of Computer Science University.
Extensive-form games Vincent Conitzer
1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.
TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.
Dynamic Games & The Extensive Form
Game-theoretic analysis tools Tuomas Sandholm Professor Computer Science Department Carnegie Mellon University.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
1 S ystems Analysis Laboratory Helsinki University of Technology Kai Virtanen, Tuomas Raivio and Raimo P. Hämäläinen Systems Analysis Laboratory Helsinki.
Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.
Using Reinforcement Learning to Model True Team Behavior in Uncertain Multiagent Settings in Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS.
Computing and Approximating Equilibria: How… …and What’s the Point? Yevgeniy Vorobeychik Sandia National Laboratories.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Competitive Scheduling in Wireless Networks with Correlated Channel State Ozan.
Modeling Agents’ Reasoning in Strategic Situations Avi Pfeffer Sevan Ficici Kobi Gal.
Twenty Second Conference on Artificial Intelligence AAAI 2007 Improved State Estimation in Multiagent Settings with Continuous or Large Discrete State.
1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.
The set of SE models include s those that are BE. It further includes models that include identical distributions over the subject agent’s action observation.
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
Learning Team Behavior Using Individual Decision Making in Multiagent Settings Using Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS Department.
EC941 - Game Theory Prof. Francesco Squintani Lecture 6 1.
1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
1 Automated Planning and Decision Making 2007 Automated Planning and Decision Making Prof. Ronen Brafman Various Subjects.
On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA Twenty.
CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.
Yifeng Zeng Aalborg University Denmark
Keep the Adversary Guessing: Agent Security by Policy Randomization
Modeling human action understanding as inverse planning
MSDM AAMAS-09 Two Level Recursive Reasoning by Humans Playing Sequential Fixed-Sum Games Authors: Adam Goodie, Prashant Doshi, Diana Young Depts.
Thrust IC: Action Selection in Joint-Human-Robot Teams
Thank you for the introduction and good morning to all of you
Artificial Intelligence
Multiagent Systems Extensive Form Games © Manfred Huber 2018.
Artificial Intelligence
13. Acting under Uncertainty Wolfram Burgard and Bernhard Nebel
Multiagent Systems Repeated Games © Manfred Huber 2018.
CS 416 Artificial Intelligence
Presentation transcript:

Influence Diagrams for Robust Decision Making in Multiagent Settings

Prashant Doshi University of Georgia, USA

Yingke Chen Post doctoral student Yifeng Zeng Reader, Teesside Univ. Previously: Assoc Prof., Aalborg Univ. Muthu Chandrasekaran Doctoral student

Influence diagram

AiAi RiRi OiOi S ID for decision making where state may be partially observable

How do we generalize IDs to multiagent settings?

Adversarial tiger problem

Multiagent influence diagram (MAID) (Koller&Milch01) MAIDs offer a richer representation for a game and may be transformed into a normal- or extensive-form game A strategy of an agent is an assignment of a decision rule to every decision node of that agent Open or Listen i RjRj Growl i Tiger loc Open or Listen j Growl j RiRi

Expected utility of a strategy profile to agent i is the sum of the expected utilities at each of i’s decision node A strategy profile is in Nash equilibrium if each agent’s strategy in the profile is optimal given others’ strategies Open or Listen i RjRj Growl i Tiger loc Open or Listen j Growl j RiRi

Strategic relevance Consider two strategy profiles which differ in the decision rule at D’ only. A decision node, D, strategically relies on another, D’, if D ‘s decision rule does not remain optimal in both profiles.

Is there a way of finding all decision nodes that are strategically relevant to D using the graphical structure? Yes, s-reachability Analogous to d-separation for determining conditional independence in BNs

Evaluating whether a decision rule at D is optimal in a given strategy profile involves removing decision nodes that are not s-relevant to D and transforming the decision and utility nodes into chance nodes Open or Listen i RjRj Growl i Tiger loc Open or Listen j Growl j RiRi

What if the agents are using differing models of the same game to make decisions, or are uncertain about the mental models others are using?

Let agent i believe with probability, p, that j will listen and with 1- p that j will do the best response decision Analogously, j believes that i will open a door with probability q, otherwise play the best response Open or Listen i RjRj Growl i Tiger loc Open or Listen j Growl j RiRi

Network of ID (NID) Let agent i believe with probability, p, that j will likely listen and with 1- p that j will do the best response decision Analogously, j believes that i will mostly open a door with probability q, otherwise play the best response Listen Open LOLOR LOLOR Block L Block O Top-level ListenOpen q p (Gal&Pfeffer08)

Let agent i believe with probability, p, that j will likely listen and with 1- p that j will do the best response decision Analogously, j believes that i will mostly open a door with probability q, otherwise play the best response Open or Listen i RjRj Growl i Tiger loc Open or Listen j Growl j RiRi Top-level Block -- MAID

MAID representation for the NID BR[i] TL R TL j Growl TL i Tiger loc TL BR[j] TL Growl TL j R TL i Mod[j; D i ] Open O Open or Listen TL i Mod[i; D j ] Listen L Open or Listen TL j

MAIDs and NIDs Rich languages for games based on IDs that models problem structure by exploiting conditional independence

MAIDs and NIDs Focus is on computing equilibrium, which does not allow for best response to a distribution of non-equilibrium behaviors Do not model dynamic games

Generalize IDs to dynamic interactions in multiagent settings

Challenge: Other agents could be updating beliefs and changing strategies

Model node: M j, l -1 models of agent j at level l -1 Policy link: dashed arrow Distribution over the other agent’s actions given its models Belief on M j, l -1 : Pr(M j, l -1 |s) Open or Listen i RiRi Growl i Tiger loc i Open or Listen j M j, l -1 Level l I-ID

Members of the model node Different chance nodes are solutions of models m j, l -1 Mod[M j ] represents the different models of agent j Mod[M j ] Aj1Aj1 Aj2Aj2 M j, l -1 S m j, l -1 1 m j, l -1 2 Open or Listen j m j, l -1 1, m j, l -1 2 could be I-IDs, IDs or simple distributions

CPT of the chance node A j is a multiplexer Assumes the distribution of each of the action nodes (A j 1, A j 2 ) depending on the value of Mod[M j ] Mod[M j ] Aj1Aj1 Aj2Aj2 M j, l -1 S m j, l -1 1 m j, l -1 2 AjAj

Could I-IDs be extended over time? We must address the challenge

A i t+1 RiRi O i t+1 S t+1 A j t+1 M j, l -1 t+1 AitAit RiRi OitOit StSt AjtAjt M j, l -1 t Model update link

Interactive dynamic influence diagram (I-DID)

How do we implement the model update link?

m j,l-1 t,2 Mod[M j t ] Aj1Aj1 M j, l -1 t stst m j,l-1 t,1 AjtAjt Aj2Aj2 Oj1Oj1 Oj2Oj2 OjOj Mod[M j t+1 ] Aj1Aj1 M j, l -1 t+1 m j,l-1 t+1,1 m j,l-1 t+1,2 A j t+1 Aj2Aj2 Aj3Aj3 Aj4Aj4 m j,l-1 t+1,3 m j,l-1 t+1,4

m j,l-1 t,2 Mod[M j t ] Aj1Aj1 M j, l -1 t stst m j,l-1 t,1 AjtAjt Aj2Aj2 Oj1Oj1 Oj2Oj2 OjOj Mod[M j t+1 ] Aj1Aj1 M j, l -1 t+1 m j,l-1 t+1,1 m j,l-1 t+1,2 A j t+1 Aj2Aj2 Aj3Aj3 Aj4Aj4 m j,l-1 t+1,3 m j,l-1 t+1,4 These models differ in their initial beliefs, each of which is the result of j updating its beliefs due to its actions and possible observations

Recap

Prashant Doshi, Yifeng Zeng and Qiongyu Chen, “Graphical Models for Interactive POMDPs: Representations and Solutions”, Journal of AAMAS, 18(3): , 2009 Daphne Koller and Brian Milch, “Multi-Agent Influence Diagrams for Representing and Solving Games”, Games and Economic Behavior, 45(1): , 2003 Ya’akov Gal and Avi Pfeffer, “Networks of Influence Diagrams: A Formalism for Representing Agent’s Beliefs and Decision-Making Processes”,Journal of AI Research, 33: , 2008

How large is the behavioral model space?

General definition A mapping from the agent’s history of observations to its actions

How large is the behavioral model space? 2H  (Aj)2H  (Aj) Uncountably infinite

How large is the behavioral model space? Let’s assume computable models Countable A very large portion of the model space is not computable!

Daniel Dennett Philosopher and Cognitive Scientist Intentional stance Ascribe beliefs, preferences and intent to explain others’ actions (analogous to theory of mind - ToM)

Organize the mental models  Intentional models  Subintentional models

Organize the mental models  Intentional models E.g., POMDP  =  b j, A j, T j,  j, O j, R j, OC j  (using DIDs) BDI, ToM  Subintentional models Frame (may give rise to recursive modeling)

Organize the mental models  Intentional models E.g., POMDP  =  b j, A j, T j,  j, O j, R j, OC j  (using DIDs) BDI, ToM  Subintentional models E.g.,  (A j ), finite state controller, plan Frame

Finite model space grows as the interaction progresses

Growth in the model space Other agent may receive any one of |  j | observations |M j |  |M j ||  j |  |M j ||  j | 2 ...  |M j ||  j | t 012t

Growth in the model space Exponential

General model space is large and grows exponentially as the interaction progresses

It would be great if we can compress this space!  No loss in value to the modeler  Flexible loss in value for greater compression Lossless Lossy

Expansive usefulness of model space compression to many areas: 1.Sequential decision making in multiagent settings using I-DIDs 2.Bayesian plan recognition 3.Games of imperfect information

General and domain-independent approach for compression Establish equivalence relations that partition the model space and retain representative models from each equivalence class

Approach #1: Behavioral equivalence (Rathanasabapathy et al.06,Pynadath&Marsella07) Intentional models whose complete solutions are identical are considered equivalent

Approach #1: Behavioral equivalence Behaviorally minimal set of models

Lossless Works when intentional models have differing frames Approach #1: Behavioral equivalence

Multiagent tiger Approach #1: Behavioral equivalence Impact on I-DIDs in multiagent settings Multiagent tiger Multiagent MM

Utilize model solutions (policy trees) for mitigating model growth Approach #1: Behavioral equivalence Model reps that are not BE may become BE next step onwards Preemptively identify such models and do not update all of them

Thank you for your time

Intentional models whose partial depth-d solutions are identical and vectors of updated beliefs at the leaves of the partial trees are identical are considered equivalent Approach #2: Revisit BE (Zeng et al.11,12) Sufficient but not necessary Lossless if frames are identical

Approach #2: ( ,d)-Behavioral equivalence Two models are ( ,d)-BE if their partial depth-d solutions are identical and vectors of updated beliefs at the leaves of the partial trees differ by  Models are (0.33,1)-BE Lossy

Approach #2:  -Behavioral equivalence Lemma (Boyen&Koller98): KL divergence between two distributions in a discrete Markov stochastic process reduces or remains the same after a transition, with the mixing rate acting as a discount factor Mixing rate represents the minimal amount by which the posterior distributions agree with each other after one transition Property of a problem and may be pre-computed

Given the mixing rate and a bound, , on the divergence between two belief vectors, lemma allows computing the depth, d, at which the bound is reached Approach #2:  -Behavioral equivalence Compare two solutions up to depth d for equality

Discount factor  F = 0.5 Multiagent Concert Approach #2:  -Behavioral equivalence Impact on dt-planning in multiagent settings Multiagent Concert On a UAV reconnaissance problem in a 5x5 grid, allows the solution to scale to a 10 step look ahead in 20 minutes

What is the value of d when some problems exhibit  F with a value of 0 or 1? Approach #2:  -Behavioral equivalence  F =1 implies that the KL divergence is 0 after one step: Set d = 1  F =0 implies that the KL divergence does not reduce: Arbitrarily set d to the horizon

Intentional or subintentional models whose predictions at time step t (action distributions) are identical are considered equivalent at t Approach #3: Action equivalence (Zeng et al.09,12)

Approach #3: Action equivalence

Lossy Works when intentional models have differing frames Approach #3: Action equivalence

Impact on dt-planning in multiagent settings Multiagent tiger AE bounds the model space at each time step to the number of distinct actions

Intentional or subintentional models whose predictions at time step t influence the subject agent’s plan identically are considered equivalent at t Regardless of whether the other agent opened the left or right door, the tiger resets thereby affecting the agent’s plan identically Approach #4: Influence equivalence (related to Witwicki&Durfee11)

Influence may be measured as the change in the subject agent’s belief due to the action Approach #4: Influence equivalence Group more models at time step t compared to AE Lossy

Compression due to approximate equivalence may violate ACC Regain ACC by appending a covering model to the compressed set of representatives

Open questions

N > 2 agents Under what conditions could equivalent models belonging to different agents be grouped together into an equivalence class?

Can we avoid solving models by using heuristics for identifying approximately equivalent models?

Modeling Strategic Human Intent

Yifeng Zeng Reader, Teesside Univ. Previously: Assoc Prof., Aalborg Univ. Yingke Chen Doctoral student Hua Mao Doctoral student Muthu Chandrasekaran Doctoral student Xia Qu Doctoral student Roi Ceren Doctoral student Matthew Meisel Doctoral student Adam Goodie Professor of Psychology, UGA

Computational modeling of human recursive thinking in sequential games Computational modeling of probability judgment in stochastic games

Human strategic reasoning is generally hobbled by low levels of recursive thinking (Stahl&Wilson95,Hedden&Zhang02,Camerer et al.04,Ficici&Pfeffer08) (I think what you think that I think...)

You are Player I and II is human. Will you move or stay? Move Stay Payoff for I: Payoff for II: IIII Player to move:

Less than 40% of the sample population performed the rational action!

Thinking about how others think (...) is hard in general contexts

Move Stay Payoff for I: (Payoff for II is 1 – decimal) IIII Player to move:

About 70% of the sample population performed the rational action in this simpler and strictly competitive game

Simplicity, competitiveness and embedding the task in intuitive representations seem to facilitate human reasoning (Flobbe et al.08, Meijering et al.11, Goodie et al.12)

3-stage game Myopic opponents default to staying (level 0) while predictive opponents think about the player’s decision (level 1)

Can we computationally model these strategic behaviors using process models?

Yes! Using a parameterized Interactive POMDP framework

Replace I-POMDP’s normative Bayesian belief update with Bayesian learning that underweights evidence, parameterized by  Notice that the achievement score increases as more games are played indicating learning of the opponent models Learning is slow and partial

Replace I-POMDP’s normative expected utility maximization with quantal response model that selects actions proportional to their utilities, parameterized by Notice the presence of rationality errors in the participants’ choices (action is inconsistent with prediction) Errors appear to reduce with time

Underweighting evidence during learning and quantal response for choice have prior psychological support

Use participants’ predictions of other’s action to learn  and participants’ actions to learn

Use participants’ actions to learn both  and Let vary linearly

Insights revealed by process modeling: 1.Much evidence that participants did not make rote use of BI, instead engaged in recursive thinking 2.Rationality errors cannot be ignored when modeling human decision making and they may vary 3.Evidence that participants’ could be attributing surprising observations of others’ actions to their rationality errors

Open questions: 1.What is the impact on strategic thinking if action outcomes are uncertain? 2.Is there a damping effect on reasoning levels if participants need to concomitantly think ahead in time

Suite of general and domain-independent approaches for compressing agent model spaces based on equivalence Computational modeling of human behavioral data pertaining to strategic thinking

2. Bayesian plan recognition under uncertainty Plan recognition literature has paid scant attention to finding general ways of reducing the set of feasible plans (Carberry, 01)

3. Games of imperfect information (Bayesian games) Real-world applications often involve many player types Examples Ad hoc coordination in a spontaneous team Automated Poker player agent

3. Games of imperfect information (Bayesian games) Real-world applications often involve many player types Model space compression facilitates equilibrium computation