# Concurrent Reachability Games Peter Bro Miltersen Aarhus University 1CTW 2009.

## Presentation on theme: "Concurrent Reachability Games Peter Bro Miltersen Aarhus University 1CTW 2009."— Presentation transcript:

Concurrent Reachability Games Peter Bro Miltersen Aarhus University 1CTW 2009

My apologies… For not getting slides ready in time for inclusion in booklet! Slides available at http://www.daimi.au.dk/~bromille CTW 20092

Concurrent reachability games Class of two-player zero-sum games generalizing simple stochastic games (Uri’s talk yesterday). Studied mainly by the formal methods (”Eurotheory”) community (but sometimes at such venues as FOCS and SODA). Very interesting and challenging algorithmic problems! CTW 20093

Simple Stochastic game (SSGs) Reachability version [Condon (1992)] Objective: MAX/min the probability of getting to the MAX-sink Two Players: MAX and min MAX min RAND R MAX- sink min- sink Slide stolen from Uri….. 1/2 ZP’96

Simple Stochastic games (SSGs) Strategies A general strategy may be randomized and history dependent A positional strategy is deterministic and history independent Positional strategy for MAX: choice of an outgoing edge from each MAX vertex Another slide stolen from Uri…..

Simple Stochastic games (SSGs) Values Both players have positional optimal strategies Every vertex i in the game has a value v i positional general positional general There are strategies that are optimal for every starting position Last slide stolen from Uri (I promise!)

Simple Stochastic game (SSGs) Reachability version [Condon (1992)] Objective: MAX/min the probability of getting to the MAX-sink Two Players: MAX and min MAX min RAND R MAX- sink min- sink 1/2 ZP’96 Concurrent Reachability Games

(Simple) concurrent reachability game Arena: – Finite directed graph. – One Max sink (”goal”) node. – Each non-sink node has assigned a 2x2 matrix of outgoing arcs. Play: – A pebble moves from node to node as in a simple stochastic game. – In each step, Max chooses a row and Min simultaneously chooses a column of the matrix. – The pebble moves along the appropriate arc. – If Max reaches the goal node he wins – If this never happens, Min wins. CTW 20098

Simulation CTW 20099 MAX

Simulation CTW 200910 min

Simulation CTW 200911 R 1/2 …. Somewhat more subtle that this works!

”Proof” of correctness We want values in the CRG to be the same as in the SSG. In particular, the value of the node simulating a coin toss should be the average of the values of the two nodes it points to. If these two values are the same, this is ”clearly” the case. If they have different values v 1, v 2, the simulated coin toss nodes is a game of Matching Pennies with payoffs v 1, v 2. This game has value (v 1 +v 2 )/2. CTW 200912

Simple Stochastic games (SSGs) Values Both players have positional optimal strategies Every vertex i in the game has a value v i positional general positional general There are strategies that are optimal for every starting position Concurrent Reachability Games (CRGs)

Simple Stochastic games (SSGs) Values Both players have stationary optimal strategies Every vertex i in the game has a value v i stationary general stationary general There are strategies that are optimal for every starting position sup inf Concurrent Reachability Games (CRGs) Stationary: As positional, except that we allow randomization

Why randomized strategies? CTW 2009 15 MAX- sink min- sink 0-1 matrix games can be immediately siimulated

Why sup/inf instead of max/min? CTW 2009 16 MAX- sink min- sink

Why sup/inf instead of max/min? CTW 2009 17 MAX- sink min- sink

Why sup/inf instead of max/min ”Conditionally repeated matching pennies”: – Min hides a penny – Max tries to guess if it is heads up or tails up. – If Max guesses correctly, he gets the penny. – If Max incorrectly guesses tails, he loses (goes into min-sink/trap) – If Max incorrectly guesses heads, the game repeats. What is the value of this game? CTW 200918 1

Almost optimal strategy for Max Guess ”heads” with probability 1- ² and ”tails” with probability ² (every time). Guaranteed to win with probability 1- ². But no strategy of Max wins with probability 1. CTW 200919

Values and near-optimal strategies Each position in a concurrent reachability game has a value. For any ε>0, each player has a stationary strategy guaranteeing the value within ε (an ε-optimal strategy). Shown in Everett, “Recursive games”, 1953.

Algorithmic problems Qualitatively solving a CRG. – Determining which nodes have value 1. Quantitatively solving a CRG. – Approximately computing the values of the nodes. Strategically solving a CRG. – Computing an ² -optimal stationary strategy for a given ². CTW 200921

Qualitatively solving CRGs De Alfaro, Henzinger, Kupferman, FOCS 1998. – Beautiful algorithm! – Formal methods community type algorithm! – Fixed point computation inside a fixed point computation inside a fixed point computation…. – Runs in time O(n 2 ). Open (I think): Can this time bound be improved? (for SSGs the corresponding time is linear) CTW 200922

Quantitatively solving CRGs We want to approximate the values of the positions. Why not compute them exactly? CTW 200923

The value of a CRG may be irrational! CTW 200924 Ferguson, Game Theory Positive payoffs different from 1 can be simulated with scaling and coin toss gadgets. Negative payoffs are harder to simulate but in this game we can do it by adding a constant to all payoffs

Quantitatively solving CRGs We want to approximate the values of the positions. Why not compute them exactly? Maybe we want to look at the decision problem consisting of comparing the value to a given rational? CTW 200925

SUM-OF-SQRT hardness SUM-OF-SQRT: Given an epression E which is a weigthed (by integers) sum of square roots (of integers), does E evaluate to a positive number? Not known to be in P or NP or even the polynomial hierarchy (open at least since Garey and Johnson). Etessami and Yannakakis, 2005: Comparing the value of a CRG to a rational number is hard for SUM-OF-SQRT. CTW 200926

Sketch of Proof We already saw how to make games whose values are the solution to certain quadratic equations, i.e., square roots + rationals. Once we have a bunch of such games, we can easily make a game whose value is the average by a ”coin toss gadget”. CTW 200927

Quantitatively solving CRGs We want to approximate the values of the positions. Why not compute them exactly? Maybe we want to compare the value to a given rational? Given ², we want to compute an approximation within ². CTW 200928

Value iteration Assign all nodes ”value approximation” 0 Replace pointers with value approximations. Each node is now a matrix game. Solve and replace approximations. Theorem: Value approximations converge to values (from below). Proof sketch: The value approximations are the exact values of a time limited version of the game. How long time to get witin 0.01 of actual values? Even for SSGs this takes exponential time (Condon’93). For CRGs, an open problem until recently (see later). CTW 200929

Another algorithm for approximating values The property of being a number larger or smaller than the value of a CRG can be expressed by a polynomial length formula in the existential first order theory of the reals. There exists a stationary strategy such that…. As a corollary to Renegar’89, approximating the value is in PSPACE. This is the best known ”complexity class” upper bound! …. also the best known concrete ”big-O” complexity bound (using Basu et al instead of Renegar). CTW 200930

Why no NP Å coNP upper bound? Guess a strategy and verify that it works? Chatterjee, Majumdar, Jurdzinski, On Nash equilibria in stochastic games, CSL’04 claims such a result. In 2007, Kousha Etessami found a technical issue in the proof and the authors retracted the claim. CTW 200931

It is not obvious that computing the values gives any information about the strategies. In contrast, for SSGs, optimal strategies can be computed from values in linear time (Andersson and M., ISAAC’09) Computing values vs. Finding strategies CTW 200932 MAX- sink

Algorithms strategically solving concurrent reachability games Chatterjee, de Alfaro, Henzinger. Strategy improvement for concurrent reachability games. QEST’06. Chatterjee, de Alfaro, Henzinger. Termination criteria for solving concurrent safety and reachability games, SODA’09. Policy improvement! No time bounds given….

Theorem [Hansen, Koucky and M., LICS’09]: – Any algorithm that manipulates ε-optimal strategies of concurrent reachability games must use exponential space (so no NP Å coNP algorithm comes from guessing strategies) – Value iteration requires worst case doubly exponential time to come within non-trivial distance of actual values (in contrast, value iteration on SSGs converges in only exponential time). “Hardness” of solving CRGs

Dante in Purgatory 1 2 3 4 5 6 7 Dante enters Purgatory at terrace 1. Purgatory has 7 terraces.

Dante in Purgatory 1 2 3 4 5 6 7 While in Purgatory, once a second, Dante must play Matching Pennies with Lucifer

Dante in Purgatory 1 2 3 4 5 6 7 If Dante wins, he proceeds to the next terrace

Dante in Purgatory 1 2 3 4 5 6 7 If Dante wins, he proceeds to the next terrace

Dante in Purgatory 1 2 3 4 5 6 7 If Dante wins, he proceeds to the next terrace

Dante in Purgatory 1 2 3 4 5 6 7 If Dante wins, he proceeds to the next terrace

Dante in Purgatory 1 2 3 4 5 6 7 If Dante wins, he proceeds to the next terrace

Dante in Purgatory 1 2 3 4 5 6 7 If Dante wins, he proceeds to the next terrace

Dante in Purgatory 1 2 3 4 5 6 7 If Dante wins, he proceeds to the next terrace

Dante in Purgatory 1 2 3 4 5 6 7 If Dante wins, he proceeds to the next terrace

Dante in Purgatory 1 2 3 4 5 6 7 If Dante wins, he proceeds to the next terrace

Dante in Purgatory 1 2 3 4 5 6 7 If Dante wins, he proceeds to the next terrace

Dante in Purgatory 1 2 3 4 5 6 7 If Dante wins, he proceeds to the next terrace

Dante in Purgatory 1 2 3 4 5 6 7 If Dante wins, he proceeds to the next terrace

Dante in Purgatory 1 2 3 4 5 6 7 If Dante wins Matching Pennies at terrace 7, he wins the game of Purgatory.

Dante in Purgatory 1 2 3 4 5 6 7 If Dante wins Matching Pennies at terrace 7, he wins the game of Purgatory.

Dante in Purgatory 1 2 3 4 5 6 7 If Dante loses Matching Pennies guessing Heads, he goes back to terrace 1.

Dante in Purgatory 1 2 3 4 5 6 7 If Dante loses Matching Pennies guessing Heads, he goes back to terrace 1.

Dante in Purgatory 1 2 3 4 5 6 7 If Dante loses Matching Pennies guessing Heads, he goes back to terrace 1.

Dante in Purgatory 1 2 3 4 5 6 7 If Dante loses Matching Pennies guessing Taiis….. …. he loses the game of Purgatory!!!!

Dante in Purgatory

Is there is a strategy for Dante so that he is guaranteed to win the game of Purgatory with probability at least 90%? – Yes. How long can Lucifer confine Dante to Purgatory if Dante plays by such a strategy? – 10 55 years. Dante in Purgatory A bit surprising – when Dante wins, he has guessed correctly which hand seven times in a row! Apply algorithm of de Alfaro, Henzinger and Kupferman

Purgatory is a game of doubly exponential patience. The patience of a mixed strategy is 1/p where p is the smallest non-zero probability used by the strategy (Everett, 1957). To win with probability 1-ε, Dante must choose “Heads” at terrace i with probability greater than (approximately) 1- ε 2 7-i On the other hand, choosing “Heads” with probability 1 is no good! To win with probability 9/10, he must choose “Heads” at terrace 1 with probability greater than 1-(1/10) 64 = 0.999999999999999999999999999999999999999999999999 9999999999999999. But then Lucifer can respond by always choosing “Tails” at terrace 1.

Theorem [Hansen, Koucky and M.]: – Any algorithm that manipulates ε-optimal strategies of concurrent reachability games must use exponential space. Proof: Storing 0.99999999999999999999999999999999999 99999999999999999999999999999 takes up a lot of space! “Hardness” of solving CRGs

Time of play and value iteration To win Purgatory with probability 1- ², almost all probability mass has to be assigned to strategies leading to plays of length at least (1/ ² ) 2 n-1. On the other hand, (1/ ² ) 2 116n is worst possible expected time of play for any game with n nodes. Corollary: To solve Purgatory quantitatively using value iteration, (1/ ² ) 2 n-1 iterations are needed to get anywhere near the correct values. But (1/ ε ) 2 116 n iterations is enough to get ε - close for any n-node game. Upper bounds shown (again )by appealing to the first order theory of the reals (semi-algebraic geometry), in particular Basu et al.

Patience of Purgatory with n terraces and ² < ½ Upper bound: (1/ ² ) 2 n-1 Lower bound: ((1- ² )/ ² 2 ) 2 n-2

Proof of lower bound

± > ± 2 WLOG first place from above where this happens…

Proof of lower bound

Open problems What is the exact patience of Purgatory? Probably not a closed expression. Is Purgatory extremal with respect to patience among n-node CRGs? If yes, this gives a better upper bound on number of iterations of value iteration for CRGs, replacing 116 with 1!

Compare Condon’s example. Extremal with respect to, e.g., expected absorption time.

Open Problem The fact that the values can be approximated in PSPACE, stronlgy suggests that PSPACE should be enough for “understanding” CRGs. Is there a “natural” representation of probabilities so that – ε-optimal strategies of CRGs can be represented succinctly and – ε-optimal strategies of CRGs can be computed using polynomial space? De Alfaro, Henzinger, Kupferman, FOCS’98: Yes, for the restricted case CRGs where the values of all positions are 0 or 1. CRGs seem much harder to analyze than SSGs. Are there any formal argument sfor this (beyond SUM-OF-SQRT hardness)?

Thank you! CTW 200967

Download ppt "Concurrent Reachability Games Peter Bro Miltersen Aarhus University 1CTW 2009."

Similar presentations