Uri Zwick – Tel Aviv Univ. Randomized pivoting rules for the simplex algorithm Lower bounds TexPoint fonts used in EMF. Read the TexPoint manual before.

Uri Zwick – Tel Aviv Univ. Randomized pivoting rules for the simplex algorithm Lower bounds TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA MDS summer school “The Combinatorics of Linear and Semidefinite Programming” August 14-16, 2012

Largest improvement Largest slope Dantzig’s rule – Largest modified cost Bland’s rule – avoids cycling Lexicographic rule – also avoids cycling Deterministic pivoting rules All known to require an exponential number of steps, in the worst-case Klee-Minty (1972) Jeroslow (1973), Avis-Chvátal (1978), Goldfarb-Sit (1979), …, Amenta-Ziegler (1996)

Klee-Minty cubes (1972) Taken from a paper by Gärtner-Henk-Ziegler

Random-Edge Choose a random improving edge Randomized pivoting rules Random-Facet is sub-exponential! Random-Facet Described in previous lecture ☺ [Kalai (1992)] [Matoušek-Sharir-Welzl (1996)] Are Random-Edge and Random-Facet polynomial ???

Abstract objective functions (AOFs) Every face should have a unique sink Acyclic Unique Sink Orientations (AUSOs)

AUSOs of n-cubes The directed diameter is exactly n Stickney, Watson (1978) Morris (2001) Szabó, Welzl (2001) Gärtner (2002) USOs and AUSOs Exercise: Prove it. 2n facets 2 n vertices

AUSO results Random-Facet is sub-exponential [Kalai (1992)] [Matoušek-Sharir-Welzl (1996)] Sub-exponential lower bound for Random-Facet [Matoušek (1994)] Sub-exponential lower bound for Random-Edge [Matoušek-Szabó (2006)] Lower bounds do not correspond to actual linear programs Can geometry help?

Random-Edge, Random-Facet are not polynomial for LPs Consider LPs that correspond to Markov Decision Processes (MDPs) Simplex  Policy iteration Obtain sub-exponential lower bounds for the Random-Edge and Random-Facet variants of the Policy Iteration algorithm for MDPs

Upper boundLower boundAlgorithm RANDOM EDGE RANDOM FACET Randomized Pivoting Rules [Kalai ’92] [Matousek-Sharir-Welzl ’92] [Friedmann-Hansen-Z ’11] Lower bounds obtained for LPs whose diameter is n

3-bit counter

Limiting average version Discounted version Total reward version Turn-based 2-Player Stochastic Games [Shapley ’53] [Gillette ’57] … [Condon ’92] Both players have optimal positional strategies Can optimal strategies be found in polynomial time?

Stopping condition For the total reward version assume: No matter what the players do, the game stops with probability 1. Exercise: Show that discounted games correspond directly to stopping total reward games

A deterministic strategy specifies which action to take given every possible history A memoryless strategy is a strategy that depends only on the current state A positional strategy is a deterministic memoryless strategy Strategies / Policies A mixed strategy is a probability distribution over deterministic strategies

Values Both players have positional optimal strategies positional general positional general There are positional strategies that are optimal for every starting position

Markov Decision Processes [Shapley ’53] [Bellman ’57] [Howard ’60] … Optimal positional policies can be found using LP Is there a strongly polynomial time algorithm? Limiting average version Discounted version Total reward version

Stochastic shortest paths (SSPs) Minimize the expected cost of getting to the target

Limiting average version Discounted version Total reward version Turn-based non-Stochastic Games [Ehrenfeucht-Mycielski (1979)] Both players have optimal positional strategies Still no polynomial time algorithms known! Easy

Turn-based Stochastic Games (SGs) long-term planning in a stochastic and adversarial environment Deterministic MDPs (DMDPs) non-stochastic, non-adversarial Markov Decision Processes (MDPs) non-adversarial stochastic Non-Stochastic Games (MPGs) adversarial non-stochastic 2½-players 2-players1½-players 1-player

Parity Games (PGs) A simple example 2 141 32 EVEN wins if largest priority seen infinitely often is even Priorities

Parity Games (PGs) EVEN 3 ODD 8 EVEN wins if largest priority seen infinitely often is even Equivalent to many interesting problems in automata and verification: Non-emptyness of  -tree automata modal  -calculus model checking

Parity Games (PGs) EVEN 3 ODD 8 Replace priority k by payoff (  n) k Mean Payoff Games (MPGs) Move payoffs to outgoing edges [Stirling (1993)] [Puri (1995)]

Let’s focus on MDPs

Evaluating a policy MDP + policy  Markov Chain Values of a fixed policy can be found by solving a system of linear equations

Improving a policy (using a single switch)

Policy iteration for MDPs [Howard ’60]

Dual LP formulation for MDPs

Basic solution  (positional) Policy a is not an improving switch

Primal LP formulation for MDPs Vertex  Complement of a Policy

TB2SG  NP  co-NP TB2SG  P ???

Policy iteration variants

Random-Facet for MDPs  Choose a random action not in the current policy and ignore it.  Solve recursively without this action.  If the ignored action is not an improving switch with respect to the returned policy, we are done.  Otherwise, switch to the ignored action and solve recursively.

Policy iteration for 2-player games  Keep a strategy of player 1 and an optimal counter-strategy of player 2.  Perform improving switches for player 1 and recompute an optimal counter-strategy for player 2. Exercise: Does it really work? Random-Facet yields a sub-exponential algorithm for turn-based 2-player stochastic games!

Lower bounds for Policy Iteration Switch-All for Parity Games is exponential [Friedmann ’09] Switch-All for MDPs is exponential [Fearnley ’10] Random-Facet for Parity Games is sub-exponential [Friedmann-Hansen-Z ’11] Random-Facet and Random-Edge for MDPs and hence for LPs are sub-exponential [FHZ’11]

Lower bound for Random-Facet Implement a randomized counter

Lower bound for Random-Facet Implement a randomized counter Lower bound for Random-Edge Implement a standard counter

Dantzig’s pivoting rule, and the standard policy iteration algorithm, Switch-All, are polynomial for discounted MDPs, with a fixed discount factor [Ye ’10] Switch-All is almost linear for discounted MDPs and discounted turn-based 2-player Stochastic Games, with a fixed discount factor [Hansen-Miltersen-Z ’11] Upper bounds for Policy Iteration

Non- discounted DiscountedAlgorithm SWITCH BEST SWITCH ALL [Ye ’10] [Hansen-Miltersen-Z ’11] [Friedmann ’09] [Fearnley ’10] Deterministic Algorithms [Condon ’93]

3-bit counter (−N) 15

3-bit counter 010

3-bit counter – Improving switches 010 Random-Edge can choose either one of these improving switches…

Cycle gadgets Cycles close one edge at a time Shorter cycles close faster

Cycle gadgets Cycles open “simultaneously”

3-bit counter 2  3 010 1

From b to b+1 in seven phases B k -cycle closes C k -cycle closes U-lane realigns A i -cycles and B i -cycles for i<k open A k -cycle closes W-lane realigns C i -cycles of 0-bits open

3-bit counter 3  4 01 1

Size of cycles Various cycles and lanes compete with each other Some are trying to open while some are trying to close We need to make sure that our candidates win! Length of all A-cycles = 8n Length of all C-cycles = 22n Length of B i -cycles = 25i 2 n O(n 4 ) vertices for an n-bit counter Can be improved using a more complicated construction and an improved analysis (work in progress)

Related results Sub-exponential lower bound for Zadeh’s pivoting rule [Friedmann ’10] Dantzig’s pivoting rule, and the standard policy iteration algorithm, Switch-All, are polynomial for discounted MDPs, with a fixed discount factor [Ye ’10] Switch-All is almost linear for discounted MDPs and discounted turn-based 2-player Stochastic Games, with a fixed discount factor [Hansen-Miltersen-Z ’11]

Concluding remarks and open problems “Game-theoretic” perspective help understand the behavior of randomized pivoting rules Polynomial pivoting rule? Polynomial bound on diameter? Strongly polynomial algorithms for MDPs? Polynomial algorithms 2-player games?

THE END

Which AUSOs can result from MDPs / PGs / MPGs / SPGs ??? AUSOs 2-player games 1½-player games Are all containments strict? 2½-player games 1-player games LP on cubes Parity games

Hard Parity Games for Random Facet [Friedmann-Hansen-Z ’10]

Uri Zwick – Tel Aviv Univ. Randomized pivoting rules for the simplex algorithm Lower bounds TexPoint fonts used in EMF. Read the TexPoint manual before.

Similar presentations

Presentation on theme: "Uri Zwick – Tel Aviv Univ. Randomized pivoting rules for the simplex algorithm Lower bounds TexPoint fonts used in EMF. Read the TexPoint manual before."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Uri Zwick – Tel Aviv Univ. Randomized pivoting rules for the simplex algorithm Lower bounds TexPoint fonts used in EMF. Read the TexPoint manual before.

Similar presentations

Presentation on theme: "Uri Zwick – Tel Aviv Univ. Randomized pivoting rules for the simplex algorithm Lower bounds TexPoint fonts used in EMF. Read the TexPoint manual before."— Presentation transcript:

Similar presentations

About project

Feedback