Concurrent Reachability Games Peter Bro Miltersen Aarhus University 1CTW 2009.

Slides:



Advertisements
Similar presentations
Analysis of Algorithms
Advertisements

Uri Zwick Tel Aviv University Simple Stochastic Games Mean Payoff Games Parity Games.
Winning concurrent reachability games requires doubly-exponential patience Michal Koucký IM AS CR, Prague Kristoffer Arnsfelt Hansen, Peter Bro Miltersen.
Lecture 24 MAS 714 Hartmut Klauck
COMP 553: Algorithmic Game Theory Fall 2014 Yang Cai Lecture 21.
6.896: Topics in Algorithmic Game Theory Lecture 11 Constantinos Daskalakis.
Congestion Games with Player- Specific Payoff Functions Igal Milchtaich, Department of Mathematics, The Hebrew University of Jerusalem, 1993 Presentation.
Energy and Mean-Payoff Parity Markov Decision Processes Laurent Doyen LSV, ENS Cachan & CNRS Krishnendu Chatterjee IST Austria MFCS 2011.
MIT and James Orlin © Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
The Theory of NP-Completeness
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Graphs Graphs are the most general data structures we will study in this course. A graph is a more general version of connected nodes than the tree. Both.
Part 3: The Minimax Theorem
Online Vertex-Coloring Games in Random Graphs Revisited Reto Spöhel (joint work with Torsten Mütze and Thomas Rast; appeared at SODA ’11)
PCPs and Inapproximability Introduction. My T. Thai 2 Why Approximation Algorithms  Problems that we cannot find an optimal solution.
Infinite Horizon Problems
02/01/11CMPUT 671 Lecture 11 CMPUT 671 Hard Problems Winter 2002 Joseph Culberson Home Page.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Complexity 11-1 Complexity Andrei Bulatov Space Complexity.
An Introduction to Game Theory Part II: Mixed and Correlated Strategies Bernhard Nebel.
Computational problems, algorithms, runtime, hardness
Stochastic Zero-sum and Nonzero-sum  -regular Games A Survey of Results Krishnendu Chatterjee Chess Review May 11, 2005.
Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger.
1 Optimization problems such as MAXSAT, MIN NODE COVER, MAX INDEPENDENT SET, MAX CLIQUE, MIN SET COVER, TSP, KNAPSACK, BINPACKING do not have a polynomial.
1 Computing Nash Equilibrium Presenter: Yishay Mansour.
Matrix Games Mahesh Arumugam Borzoo Bonakdarpour Ali Ebnenasir CSE 960: Selected Topics in Algorithms and Complexity Instructor: Dr. Torng.
Stochastic Games Games played on graphs with stochastic transitions Markov decision processes Games against nature Turn-based games Games against adversary.
CSE 326: Data Structures NP Completeness Ben Lerner Summer 2007.
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
Coloring random graphs online without creating monochromatic subgraphs Torsten Mütze, ETH Zürich Joint work with Thomas Rast (ETH Zürich) and Reto Spöhel.
Finite Mathematics & Its Applications, 10/e by Goldstein/Schneider/SiegelCopyright © 2010 Pearson Education, Inc. 1 of 68 Chapter 9 The Theory of Games.
DANSS Colloquium By Prof. Danny Dolev Presented by Rica Gonen
Stochastic Games Krishnendu Chatterjee CS 294 Game Theory.
Minimax strategies, Nash equilibria, correlated equilibria Vincent Conitzer
PSPACE-Completeness Section 8.3 Giorgi Japaridze Theory of Computability.
Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
The Hat Game 11/19/04 James Fiedler. References Hendrik W. Lenstra, Jr. and Gadiel Seroussi, On Hats and Other Covers, preprint, 2002,
MAKING COMPLEX DEClSlONS
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
The Theory of NP-Completeness 1. What is NP-completeness? Consider the circuit satisfiability problem Difficult to answer the decision problem in polynomial.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Dynamic Games of complete information: Backward Induction and Subgame perfection - Repeated Games -
6.853: Topics in Algorithmic Game Theory Fall 2011 Constantinos Daskalakis Lecture 11.
Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdzinski (University of Warwick) Peter Bro Miltersen (Aarhus University)
Uri Zwick Tel Aviv University Simple Stochastic Games Mean Payoff Games Parity Games TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
M ONTE C ARLO SIMULATION Modeling and Simulation CS
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Ásbjörn H Kristbjörnsson1 The complexity of Finding Nash Equilibria Ásbjörn H Kristbjörnsson Algorithms, Logic and Complexity.
NP-Complete problems.
1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.
1. 2 You should know by now… u The security level of a strategy for a player is the minimum payoff regardless of what strategy his opponent uses. u A.
CPS Computational problems, algorithms, runtime, hardness (a ridiculously brief introduction to theoretical computer science) Vincent Conitzer.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
9.2 Mixed Strategy Games In this section, we look at non-strictly determined games. For these type of games the payoff matrix has no saddle points.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Theory of Computational Complexity Probability and Computing Lee Minseon Iwama and Ito lab M1 1.
The Theory of NP-Completeness
P & NP.
Kristoffer Arnsfelt Hansen Rasmus Ibsen-Jensen Peter Bro Miltersen
Computational problems, algorithms, runtime, hardness
Game Theory Just last week:
Selection in heaps and row-sorted matrices
Uri Zwick Tel Aviv University
Chapter 11 Limitations of Algorithm Power
CPS 173 Computational problems, algorithms, runtime, hardness
Presentation transcript:

Concurrent Reachability Games Peter Bro Miltersen Aarhus University 1CTW 2009

My apologies… For not getting slides ready in time for inclusion in booklet! Slides available at CTW 20092

Concurrent reachability games Class of two-player zero-sum games generalizing simple stochastic games (Uri’s talk yesterday). Studied mainly by the formal methods (”Eurotheory”) community (but sometimes at such venues as FOCS and SODA). Very interesting and challenging algorithmic problems! CTW 20093

Simple Stochastic game (SSGs) Reachability version [Condon (1992)] Objective: MAX/min the probability of getting to the MAX-sink Two Players: MAX and min MAX min RAND R MAX- sink min- sink Slide stolen from Uri….. 1/2 ZP’96

Simple Stochastic games (SSGs) Strategies A general strategy may be randomized and history dependent A positional strategy is deterministic and history independent Positional strategy for MAX: choice of an outgoing edge from each MAX vertex Another slide stolen from Uri…..

Simple Stochastic games (SSGs) Values Both players have positional optimal strategies Every vertex i in the game has a value v i positional general positional general There are strategies that are optimal for every starting position Last slide stolen from Uri (I promise!)

Simple Stochastic game (SSGs) Reachability version [Condon (1992)] Objective: MAX/min the probability of getting to the MAX-sink Two Players: MAX and min MAX min RAND R MAX- sink min- sink 1/2 ZP’96 Concurrent Reachability Games

(Simple) concurrent reachability game Arena: – Finite directed graph. – One Max sink (”goal”) node. – Each non-sink node has assigned a 2x2 matrix of outgoing arcs. Play: – A pebble moves from node to node as in a simple stochastic game. – In each step, Max chooses a row and Min simultaneously chooses a column of the matrix. – The pebble moves along the appropriate arc. – If Max reaches the goal node he wins – If this never happens, Min wins. CTW 20098

Simulation CTW MAX

Simulation CTW min

Simulation CTW R 1/2 …. Somewhat more subtle that this works!

”Proof” of correctness We want values in the CRG to be the same as in the SSG. In particular, the value of the node simulating a coin toss should be the average of the values of the two nodes it points to. If these two values are the same, this is ”clearly” the case. If they have different values v 1, v 2, the simulated coin toss nodes is a game of Matching Pennies with payoffs v 1, v 2. This game has value (v 1 +v 2 )/2. CTW

Simple Stochastic games (SSGs) Values Both players have positional optimal strategies Every vertex i in the game has a value v i positional general positional general There are strategies that are optimal for every starting position Concurrent Reachability Games (CRGs)

Simple Stochastic games (SSGs) Values Both players have stationary optimal strategies Every vertex i in the game has a value v i stationary general stationary general There are strategies that are optimal for every starting position sup inf Concurrent Reachability Games (CRGs) Stationary: As positional, except that we allow randomization

Why randomized strategies? CTW MAX- sink min- sink 0-1 matrix games can be immediately siimulated

Why sup/inf instead of max/min? CTW MAX- sink min- sink

Why sup/inf instead of max/min? CTW MAX- sink min- sink

Why sup/inf instead of max/min ”Conditionally repeated matching pennies”: – Min hides a penny – Max tries to guess if it is heads up or tails up. – If Max guesses correctly, he gets the penny. – If Max incorrectly guesses tails, he loses (goes into min-sink/trap) – If Max incorrectly guesses heads, the game repeats. What is the value of this game? CTW

Almost optimal strategy for Max Guess ”heads” with probability 1- ² and ”tails” with probability ² (every time). Guaranteed to win with probability 1- ². But no strategy of Max wins with probability 1. CTW

Values and near-optimal strategies Each position in a concurrent reachability game has a value. For any ε>0, each player has a stationary strategy guaranteeing the value within ε (an ε-optimal strategy). Shown in Everett, “Recursive games”, 1953.

Algorithmic problems Qualitatively solving a CRG. – Determining which nodes have value 1. Quantitatively solving a CRG. – Approximately computing the values of the nodes. Strategically solving a CRG. – Computing an ² -optimal stationary strategy for a given ². CTW

Qualitatively solving CRGs De Alfaro, Henzinger, Kupferman, FOCS – Beautiful algorithm! – Formal methods community type algorithm! – Fixed point computation inside a fixed point computation inside a fixed point computation…. – Runs in time O(n 2 ). Open (I think): Can this time bound be improved? (for SSGs the corresponding time is linear) CTW

Quantitatively solving CRGs We want to approximate the values of the positions. Why not compute them exactly? CTW

The value of a CRG may be irrational! CTW Ferguson, Game Theory Positive payoffs different from 1 can be simulated with scaling and coin toss gadgets. Negative payoffs are harder to simulate but in this game we can do it by adding a constant to all payoffs

Quantitatively solving CRGs We want to approximate the values of the positions. Why not compute them exactly? Maybe we want to look at the decision problem consisting of comparing the value to a given rational? CTW

SUM-OF-SQRT hardness SUM-OF-SQRT: Given an epression E which is a weigthed (by integers) sum of square roots (of integers), does E evaluate to a positive number? Not known to be in P or NP or even the polynomial hierarchy (open at least since Garey and Johnson). Etessami and Yannakakis, 2005: Comparing the value of a CRG to a rational number is hard for SUM-OF-SQRT. CTW

Sketch of Proof We already saw how to make games whose values are the solution to certain quadratic equations, i.e., square roots + rationals. Once we have a bunch of such games, we can easily make a game whose value is the average by a ”coin toss gadget”. CTW

Quantitatively solving CRGs We want to approximate the values of the positions. Why not compute them exactly? Maybe we want to compare the value to a given rational? Given ², we want to compute an approximation within ². CTW

Value iteration Assign all nodes ”value approximation” 0 Replace pointers with value approximations. Each node is now a matrix game. Solve and replace approximations. Theorem: Value approximations converge to values (from below). Proof sketch: The value approximations are the exact values of a time limited version of the game. How long time to get witin 0.01 of actual values? Even for SSGs this takes exponential time (Condon’93). For CRGs, an open problem until recently (see later). CTW

Another algorithm for approximating values The property of being a number larger or smaller than the value of a CRG can be expressed by a polynomial length formula in the existential first order theory of the reals. There exists a stationary strategy such that…. As a corollary to Renegar’89, approximating the value is in PSPACE. This is the best known ”complexity class” upper bound! …. also the best known concrete ”big-O” complexity bound (using Basu et al instead of Renegar). CTW

Why no NP Å coNP upper bound? Guess a strategy and verify that it works? Chatterjee, Majumdar, Jurdzinski, On Nash equilibria in stochastic games, CSL’04 claims such a result. In 2007, Kousha Etessami found a technical issue in the proof and the authors retracted the claim. CTW

It is not obvious that computing the values gives any information about the strategies. In contrast, for SSGs, optimal strategies can be computed from values in linear time (Andersson and M., ISAAC’09) Computing values vs. Finding strategies CTW MAX- sink

Algorithms strategically solving concurrent reachability games Chatterjee, de Alfaro, Henzinger. Strategy improvement for concurrent reachability games. QEST’06. Chatterjee, de Alfaro, Henzinger. Termination criteria for solving concurrent safety and reachability games, SODA’09. Policy improvement! No time bounds given….

Theorem [Hansen, Koucky and M., LICS’09]: – Any algorithm that manipulates ε-optimal strategies of concurrent reachability games must use exponential space (so no NP Å coNP algorithm comes from guessing strategies) – Value iteration requires worst case doubly exponential time to come within non-trivial distance of actual values (in contrast, value iteration on SSGs converges in only exponential time). “Hardness” of solving CRGs

Dante in Purgatory Dante enters Purgatory at terrace 1. Purgatory has 7 terraces.

Dante in Purgatory While in Purgatory, once a second, Dante must play Matching Pennies with Lucifer

Dante in Purgatory If Dante wins, he proceeds to the next terrace

Dante in Purgatory If Dante wins, he proceeds to the next terrace

Dante in Purgatory If Dante wins, he proceeds to the next terrace

Dante in Purgatory If Dante wins, he proceeds to the next terrace

Dante in Purgatory If Dante wins, he proceeds to the next terrace

Dante in Purgatory If Dante wins, he proceeds to the next terrace

Dante in Purgatory If Dante wins, he proceeds to the next terrace

Dante in Purgatory If Dante wins, he proceeds to the next terrace

Dante in Purgatory If Dante wins, he proceeds to the next terrace

Dante in Purgatory If Dante wins, he proceeds to the next terrace

Dante in Purgatory If Dante wins, he proceeds to the next terrace

Dante in Purgatory If Dante wins, he proceeds to the next terrace

Dante in Purgatory If Dante wins Matching Pennies at terrace 7, he wins the game of Purgatory.

Dante in Purgatory If Dante wins Matching Pennies at terrace 7, he wins the game of Purgatory.

Dante in Purgatory If Dante loses Matching Pennies guessing Heads, he goes back to terrace 1.

Dante in Purgatory If Dante loses Matching Pennies guessing Heads, he goes back to terrace 1.

Dante in Purgatory If Dante loses Matching Pennies guessing Heads, he goes back to terrace 1.

Dante in Purgatory If Dante loses Matching Pennies guessing Taiis….. …. he loses the game of Purgatory!!!!

Dante in Purgatory

Is there is a strategy for Dante so that he is guaranteed to win the game of Purgatory with probability at least 90%? – Yes. How long can Lucifer confine Dante to Purgatory if Dante plays by such a strategy? – years. Dante in Purgatory A bit surprising – when Dante wins, he has guessed correctly which hand seven times in a row! Apply algorithm of de Alfaro, Henzinger and Kupferman

Purgatory is a game of doubly exponential patience. The patience of a mixed strategy is 1/p where p is the smallest non-zero probability used by the strategy (Everett, 1957). To win with probability 1-ε, Dante must choose “Heads” at terrace i with probability greater than (approximately) 1- ε 2 7-i On the other hand, choosing “Heads” with probability 1 is no good! To win with probability 9/10, he must choose “Heads” at terrace 1 with probability greater than 1-(1/10) 64 = But then Lucifer can respond by always choosing “Tails” at terrace 1.

Theorem [Hansen, Koucky and M.]: – Any algorithm that manipulates ε-optimal strategies of concurrent reachability games must use exponential space. Proof: Storing takes up a lot of space! “Hardness” of solving CRGs

Time of play and value iteration To win Purgatory with probability 1- ², almost all probability mass has to be assigned to strategies leading to plays of length at least (1/ ² ) 2 n-1. On the other hand, (1/ ² ) 2 116n is worst possible expected time of play for any game with n nodes. Corollary: To solve Purgatory quantitatively using value iteration, (1/ ² ) 2 n-1 iterations are needed to get anywhere near the correct values. But (1/ ε ) n iterations is enough to get ε - close for any n-node game. Upper bounds shown (again )by appealing to the first order theory of the reals (semi-algebraic geometry), in particular Basu et al.

Patience of Purgatory with n terraces and ² < ½ Upper bound: (1/ ² ) 2 n-1 Lower bound: ((1- ² )/ ² 2 ) 2 n-2

Proof of lower bound

± > ± 2 WLOG first place from above where this happens…

Proof of lower bound

Open problems What is the exact patience of Purgatory? Probably not a closed expression. Is Purgatory extremal with respect to patience among n-node CRGs? If yes, this gives a better upper bound on number of iterations of value iteration for CRGs, replacing 116 with 1!

Compare Condon’s example. Extremal with respect to, e.g., expected absorption time.

Open Problem The fact that the values can be approximated in PSPACE, stronlgy suggests that PSPACE should be enough for “understanding” CRGs. Is there a “natural” representation of probabilities so that – ε-optimal strategies of CRGs can be represented succinctly and – ε-optimal strategies of CRGs can be computed using polynomial space? De Alfaro, Henzinger, Kupferman, FOCS’98: Yes, for the restricted case CRGs where the values of all positions are 0 or 1. CRGs seem much harder to analyze than SSGs. Are there any formal argument sfor this (beyond SUM-OF-SQRT hardness)?

Thank you! CTW