Presentation is loading. Please wait.

Presentation is loading. Please wait.

On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA 30602 Twenty.

Similar presentations


Presentation on theme: "On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA 30602 Twenty."— Presentation transcript:

1 On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA 30602 Twenty First National Conference on AI (AAAI 2006) Piotr J. Gmytrasiewicz Dept. of Computer Science University of Illinois at Chicago Chicago, IL 60607

2 Background on Interactive POMDPs Subjective Equilibrium in I-POMDPs and Sufficient Conditions Difficulty in Satisfying the Conditions Outline

3 Interactive POMDPs Background –Well-known framework for decision-making in single agent partially observable settings: POMDP –Traditional analysis of multiagent interactions: Game theory Problem “... there is currently no good way to combine game theoretic and POMDP control strategies.” - Russell and Norvig AI: A Modern Approach, 2 nd Ed.

4 Interactive POMDPs Environment State Optimize an agent’s preferences given beliefs General Problem Setting Beliefs action observation Beliefsobservation action

5 Interactive POMDPs Key ideas: Integrate game theoretic concepts into a decision theoretic framework –Include possible models of other agents in your decision making  intentional (types) and subintentional models –Address uncertainty by maintaining beliefs over the state and models of other agents  Bayesian learning –Beliefs over intentional models give rise to interactive belief systems  Interactive epistemology, recursive modeling –Computable approximation of the interactive belief system  Finitely nested belief system –Compute best responses to your beliefs  Subjective rationality

6 Interactive POMDPs Interactive state space –Include models of other agents into the state space Beliefs in I-POMDPs (computable)

7 Belief Update: The belief update function for I-POMDP i involves: –Use the other agent’s model to predict its action(s) –Anticipate the other agent’s observations and how it updates its model –Use your own observations to correct your beliefs Interactive POMDPs Formal Definition and Relevant Properties Prediction: Correction: Policy Computation – Analogously to POMDPs (given the new belief update)

8 Example Multiagent Tiger Problem Task: Maximize collection of gold over a finite or infinite number of steps while avoiding tiger Each agent hears growls (GL or GR) as well as creaks (S,CL, or CR) Each agent may open doors or listen (OL,OR, or L) Each agent is unable to perceive other’s observation Agents i & j

9 Subjective Equilibrium and Conditions for Achieving It

10 Theoretical Analysis: Joint observation histories (paths of play) in the multiagent tiger problem Subjective Equilibrium in I-POMDPs

11 Agents i and j’s joint policies induce a true distribution over the future observation sequences True distribution over obs. histories Agent i’s beliefs over j’s models and its own policy induce a subjective distribution over the future observation sequences Subjective distribution over obs. histories

12 Absolute Continuity Condition (ACC) Subjective distribution should not rule out the observation histories considered possible by the true distribution Cautious beliefs  “Grain of truth” assumption “Grain of truth” is sufficient but not necessary to satisfy the ACC Subjective Equilibrium in I-POMDPs

13 Proposition 1 (Convergence): Under ACC, an agent’s belief over other’s models updated using the I-POMDP belief update converges with probability 1 –Proof sketch: Show that Bayesian learning in I-POMDPs is a Martingale Apply the Martingale Convergence Theorem (Doob53)  -closeness of distributions: Subjective Equilibrium in I-POMDPs ≤ ≤ 

14 Lemma (Blackwell&Dubins62): For all agents, if their initial beliefs satisfy ACC, then after finite time T(  ), each of their beliefs are  -close to the true distribution over the future observation paths Subjective  -Equilibrium (Kalai&Lehrer93): A profile of strategies of agents each of which is an exact best response to a belief that is  -close to the true distribution over the observation history –Subjective equilibrium is stable under learning and optimization Prediction

15 Subjective Equilibrium in I-POMDPs Main Result Proposition 2: If agents’ beliefs within the I-POMDP framework satisfy the ACC, then after finite time T, their strategies are in subjective  -equilibrium, where  is a function of T –When  = 0, subjective equilibrium obtains –Proof follows from the convergence of the I-POMDP belief update and (Blackwell&Dubins62) –ACC is a sufficient condition, but not a necessary one

16 Difficulty in Practically Satisfying the Conditions

17 Computational Difficulties in Achieving Equilibrium There exist computable strategies that admit no computable exact best responses ( Nachbar&Zame96 ) If possible strategies are assumed computable, then i’s best response may not be computable. Therefore, j’s cautious beliefs  grain of truth –Subtle tension between prediction and optimization –Strictness of ACC

18 Computational Difficulties in Achieving Equilibrium Proposition 3 (Impossibility): Within the finitely nested I-POMDP framework, all the agents’ beliefs will never simultaneously satisfy the grain of truth assumption Difficult to realize the equilibrium!

19 Summary Absolute Continuity Condition (ACC) –More realistic: “grain of truth” condition –Grain of truth condition is stronger than ACC Equilibria in I-POMDPs –Theoretical convergence to subjective equilibrium given ACC Strictness of ACC –Impossible to simultaneously satisfy grain of truth –Computational obstacles to satisfying ACC Future Work: Investigate the connection between subjective equilibrium and Nash equilibrium

20 Thank You Questions

21 Introduction Significance: Real world applications 1.Robotics Planetary exploration  Surface mapping by rovers  Coordinate to explore pre- defined region optimally Uncertainty due to sensors Robot soccer  Coordinate with teammates and deceive opponents Anticipate and track others’ actions RoboCup Competition SpiritOpportunity

22 Interactive POMDPs Limitations of Nash Equilibrium –Not suitable for general control Incomplete: Does not say what to do off-equilibria Non-unique: Multiple solutions, no way to choose “…game theory has been used primarily to analyze environments that are at equilibrium, rather than to control agents within an environment.” - Russell and Norvig AI: A Modern Approach, 2 nd Ed.


Download ppt "On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA 30602 Twenty."

Similar presentations


Ads by Google