Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider.

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider and Sebastian Thrun July 21, 2004AAMAS 2004

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Robot Teams

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Robot Teams With limited communication, existing paradigms for decentralized robot control are not sufficient Game theoretic methods are necessary for multi-robot coordination under these conditions

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Decentralized Decision Making

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Decentralized Decision Making A robot cannot choose actions based only on joint observations consistent with its own sensor readings It must consider all joint observations that are consistent with its possible sensor readings

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Relationship Between Decision Theoretic Models State Space Belief Space MDPPOMDP ? Distribution over Belief Space

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Models of Multi-Agent Systems Partially observable stochastic games Generalization of stochastic games to partially observable worlds Related models DEC-POMDP [Bernstein et al., 2000] MTDP [Pynadath and Tambe, 2002] I-POMDP [Gmystrasiewicz and Doshi, 2004] POIPSG [Peshkin et al., 2000]

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Partially Observable Stochastic Games POSG = {I, S, A, Z, T, R, O} I is the set of agents, I= {1, …,n} S is the set of states A is the set of actions, A= A 1   A n Z is the set of observations, Z= Z 1   Z n T is the transition function, T: S  A  S R is the reward function, R: S  A   O are the observation emission probabilities O: S  Z  A  [0,1]

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Solving POSGs POSGs are computationally infeasible to solve

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Solving POSGs Full POSG One-Step Lookahead Game at time t (Bayesian Game) We can approximate a POSG as a series of smaller Bayesian games

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Bayesian Games Private information relevant to game Uncertainty in utility Type Encapsulates private information Will limit selves to games with finite number of types In robot example Type 1: Robot doesn ’ t see anything Type 2: Robot sees intruder at location x

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Bayesian Games BG = {I, , A,p(  ), u}  is the joint type space,  =  1    n  is a specific joint type,  = {  1, …,  n } p(  ) is common prior on the distribution over  u is the utility function, u= {u 1, …,u n } u i (a i,a -i,(  i,  -i ))  i is a strategy for player i Defines what player i does for each of its possible types Actions are individual actions, not joint actions

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Bayesian-Nash Equilibrium Set of best response strategies Each agent tries to maximize its expected utility conditioned on its probability distribution over the other agents ’ types p(  ) Each agent has a policy  i that, given  -i, maximizes u i (  i,  -i,  -i )

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo POSG to Bayesian Game Approximation {I,S,A,Z,T,R,O} to {I, , A,p(  ), u} t I = I A = A Type space  i t = all possible histories of agent i ’ s actions and observations up to time t p(  ) t calculated from S 0,A,T,Z,O,  t-1 Prune low probability types Each joint type  maps to a joint belief u given by heuristic and u i = u j QMDP

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Algorithm Initialize t=0, h i = {},p(  0 )  0 =solveGame(  0,p(  0 )) Make Observation h i = obs i t U a i t-1 U h i Determine Type  i t = bestMatch(h i,  i t ) Execute Action a i t =  i t (  i t ) Propagate Forward  t+1,p(  t+1 ) Find Policy for t+1  t+1 =solveGame(  t,p(  t )) t= t+1 Agent i Initialize t=0, h j = {},p(  0 )  0 =solveGame(  0,p(  0 )) Make Observation h j = obs j t U a j t-1 U h j Determine Type  j t = bestMatch(h j,  2 t ) Execute Action a j t =  j t (  j t ) Propagate Forward  t+1,p(  t+1 ) Find Policy for t+1  t+1 =solveGame(  t,p(  t )) t= t+1 Agent j

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Robotic Team Tag Version of Team Tag Environment is portion of Gates Hall Full teammate observability Opponent can be captured by a single robot in any state QMDP used as heuristic Two pioneer-class robots

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Robot Policies

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Lady And The Tiger [Nair et al. 2003]

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Contributions Algorithm for finding approximate solutions to POSG with common payoffs Tractability achieved by modeling POSG as a sequence of Bayesian games Performs comparably to the full POSG for a small finite-horizon problem Improved performance over ‘ blind ’ application of utility heuristic in more complex problems Successful real-time game-theoretic controller for indoor robots

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Questions? remery@cs.cmu.edu www.cs.cmu.edu/~remery

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Back-Up Slides

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Lady And The Tiger [Nair et al. 2003]

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Robotic Team Tag I = {1,2} S = S 1 X S 2 X S opponent S i = {s 0, …,s 28 }, s opponent = {s 0, …,s 28,s tagged } |S| = 25230 A i = {N,S,E,W,Tag} Z i = [{s i,-1},s -i,a -i ] T: adjacent cells O: see opponent if on same cell R: minimize capture time Modified from [Pineau et al. 2003]

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Environment

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Robotic Team Tag Results

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider.

Similar presentations

Presentation on theme: "Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider.

Similar presentations

Presentation on theme: "Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider."— Presentation transcript:

Similar presentations

About project

Feedback