Download presentation

Presentation is loading. Please wait.

Published byAshlyn Riley Modified about 1 year ago

1
Randomness for Free Laurent Doyen LSV, ENS Cachan & CNRS joint work with Krishnendu Chatterjee, Hugo Gimbert, Tom Henzinger

2
Stochastic games on graphs Games for synthesis - Reactive system synthesis = finding a winning strategy in a two-player game Game played on a graph - Infinite number of rounds - Player’s moves determine successor state - Outcome = infinite path in the graph When is randomness more powerful ? When is randomness for free ?

3
Games for synthesis - Reactive system synthesis = finding a winning strategy in a game Game played on a graph - Infinite number of rounds - Player’s moves determine successor state - Outcome = infinite path in the graph When is randomness more powerful ? When is randomness for free ? … in game structures ? … in strategies ? Stochastic games on graphs

4
Interaction: how players' moves are combined. Information: what is visible to the players. Classification according to Information & Interaction Round 1 Stochastic games on graphs

5
Interaction: how players' moves are combined. Information: what is visible to the players. Classification according to Information & Interaction Round 2 Stochastic games on graphs

6
Interaction: how players' moves are combined. Information: what is visible to the players. Classification according to Information & Interaction Round 3 Stochastic games on graphs

7
Mode of interaction Classification according to Information & Interaction Interaction General case: concurrent & stochastic Player 1’s move Player 2’s move Probability distribution on successor state Players choose their moves simultaneously and independently -player games

8
Mode of interaction Classification according to Information & Interaction Interaction General case: concurrent & stochastic Player 1’s move Player 2’s move Probability distribution on successor state Players choose their moves simultaneously and independently -player games (MDP) if A 1 or A 2 is a singleton.

9
Mode of interaction Interaction General case: concurrent & stochastic Separation of concurrency & probabilities

10
Mode of interaction Interaction Special case: turn-based Player 1 state Player 2 state In each state, one player’s move determines successor

11
Knowledge Classification according to Information & Interaction In state, player sees such that Information General case: partial observation Two partitions and Player 1’s view Player 2’s view

12
Knowledge Information Special case 1: one-sided complete observation or Player 1’s view Player 2’s view

13
Knowledge Information Special case 2: complete observation and Player 1’s view Player 2’s view

14
Classification Classification according to Information & Interaction complete observation one-sided complete obs. partial observation turn-based concurrent Information Interaction -player games

15
Classification Classification according to Information & Interaction complete observation MDP one-sided complete obs. POMDP turn-based concurrent = Information Interaction -player games Markov Decision Process partial observation =

16
Strategies A strategy for Player is a function that maps histories to probability distribution over actions. Strategies are observation-based:

17
Strategies A strategy for Player is a function that maps histories to probability distribution over actions. Strategies are observation-based: Special case: pure strategies

18
Objectives An objective (for player 1) is a measurable set of infinite sequences of states:

19
Objectives An objective (for player 1) is a measurable set of infinite sequences of states: Examples: - Reachability, safety - Büchi, coBüchi - Parity - Borel BC BüchicoBüchi ReachabilitySafety

20
Value Probability of finite prefix of a play: induces a unique probability measure on measurable sets of plays: and a value function for Player 1:

21
Value Probability of finite prefix of a play: induces a unique probability measure on measurable sets of plays: and a value function for Player 1: Our reductions preserve values and existence of optimal strategies.

22
Outline Randomness in game structures - for free with complete-observation, concurrent - for free with one-sided, turn-based Randomness in strategies - for free in (PO)MDP Corollary: undecidability results

23
Preliminary Rational probabilitiesProbabilities only 1. Make all probabilistic states have two successors

24
Preliminary Rational probabilitiesProbabilities only 2. Use binary encoding of p and q-p to simulate [Zwick,Paterson 96] :

25
Preliminary Rational probabilitiesProbabilities only 2. Use binary encoding of p and q-p to simulate [Zwick,Paterson 96] :

26
Preliminary Rational probabilitiesProbabilities only 2. Use binary encoding of p and q-p to simulate [Zwick,Paterson 96] : Polynomial-time reduction, for parity objectives

27
Outline Randomness in game structure - for free with complete-observation, concurrent - for free with one-sided, turn-based Randomness in strategies - for free in (PO)MDP Corollary: undecidability results

28
Classification Classification according to Information & Interaction complete observation one-sided complete obs. partial observation turn-based concurrent Information Interaction

29
Reduction Simulate probabilistic state with concurrent state

30
Probability to go from s to s’ 0 : Reduction Simulate probabilistic state with concurrent state

31
Reduction Simulate probabilistic state with concurrent state Probability to go from s to s’ 0 : If, then

32
Probability to go from s to s’ 0 : If, then Reduction Simulate probabilistic state with concurrent state

33
Probability to go from s to s’ 0 : If, then Simulate probabilistic state with concurrent state Each player can unilaterally decide to simulate the original game. Reduction

34
Simulate probabilistic state with concurrent state Player 1 can obtain at least the value v(s) by playing all actions uniformly at random: v’(s) ≥ v(s) Player 2 can force the value to be at most v(s) by playing all actions uniformly at random: v’(s) ≤ v(s) Reduction

35
For {complete,one-sided,partial} observation, given a game with rational probabilities we can construct a concurrent game with deterministic transition function such that: and existence of optimal observation-based strategies is preserved. The reduction is in polynomial time for complete-observation parity games. Reduction

36
Information leak from the back edges… Partial information

37
Information leak from the back edges… Partial information No information leakage if all probabilities have same denominator q use 1/q=gcd of all probabilities in G

38
Information leak from the back edges… Partial information No information leakage if all probabilities have same denominator q use 1/q=gcd of all probabilities in G The reduction is then exponential.

39
Example

40
Outline Randomness in game structure - for free with complete-observation, concurrent - for free with one-sided, turn-based Randomness in strategies - for free in (PO)MDP Corollary: undecidability results

41
Overview Classification according to Information & Interaction complete observation one-sided complete obs. partial observation turn-based concurrent Information Interaction

42
Reduction Simulate probabilistic state with imperfect information turn-based states Player 1 states Player 2 state Player 1 observation

43
Reduction Each player can unilaterally decide to simulate the probabilistic state by playing uniformly at random: Player 2 chooses states (s,0), (s,1) unifiormly at random Player 1 chooses actions 0,1 unifiormly at random Simulate probabilistic state with imperfect information turn-based states

44
Games with rational probabilities can be reduced to turn- based-games with deterministic transitions and (at least) one-sided complete observation. Values and existence of optimal strategies are preserved. Reduction

45
Randomness for free In transition function (this talk):

46
When randomness is not for free Complete-information turn-based ( ) games - in deterministic games, value is either 0 or 1 [Martin98] - MDPs with reachability objective can have values in [0,1] Randomness is not for free. -player games (MDP & POMDP) - in deterministic partial info -player games, value is either 0 or 1 [see later] - MDPs have value in [0,1] Randomness is not for free.

47
Randomness for free In transition function (this talk):

48
Randomness for free In transition function (this talk): In strategies (Everett’57, Martin’98, CDHR’07):

49
Randomness in strategies Example - concurrent, complete observation - reachability Reminder: randomized strategy for Player : pure strategy for Player : Randomized strategies are more powerful

50
Randomness in strategies Example - turn-based, one-sided complete observation - reachability Randomized strategies are more powerful

51
Outline Randomness in game structure - for free with complete-observation, concurrent - for free with one-sided, turn-based Randomness in strategies - for free in (PO)MDP Corollary: undecidability results

52
(p) Randomness in strategies Randomized strategy probability p a b (1-p) RandStrategy(p) { x = rand(0…1) if x≤p then out(a) else out(b) }

53
(p) Randomness in strategies Randomized strategy probability p a b (1-p) Random generator Uniform probability x [0,1] Pure strategy a b x ≤ p x > p

54
Randomness for free in strategies For every randomized observation-based strategy, there exists a pure observation-based strategy such that: 1½-player game (POMDP), s 0 initial state. Proof. (assume alphabet of size 2, and fan-out = 2) Given σ, we show that the value of σ can be obtained as the average of the value of pure strategies σ x :

55
Randomness for free in strategies Strategies σ x are obtained by «de-randomization» of σ Assume σ(s 1 )(a) = p and σ(s 1 )(b) = 1-p σ is equivalent to playing σ 0 with frequency p and σ 1 with frequency 1-p where:

56
Randomness for free in strategies Strategies σ x are obtained by «de-randomization» of σ Assume σ(s 1 )(a) = p and σ(s 1 )(b) = 1-p σ is equivalent to playing σ 0 with frequency p and σ 1 with frequency 1-p where:

57
Randomness for free in strategies Equivalently, toss a coin x [0,1], play σ 0 if x ≤ p, and play σ 1 if x > p. Playing σ in G can be viewed as a sequence of coin tosses: s0s0 a b x ≤ p x > p toss x [0,1] [ p = σ(s 1 )(a) ]

58
s1s1 Randomness for free in strategies Equivalently, toss a coin x [0,1], play σ 0 if x ≤ p, and play σ 1 if x > p. Playing σ in G can be viewed as a sequence of coin tosses: s0s0 a b x ≤ p x > p y ≤ q a y > q a y ≤ q b y > q b s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 toss x [0,1] toss y [0,1] [ p = σ(s 1 )(a) ] [ q a = δ(s 0,a)(s 1 ) ]s’ [ q b = δ(s 0,b)(s 1 ) ]s’’’

59
Randomness for free in strategies Equivalently, toss a coin x [0,1], play σ 0 if x ≤ p, and play σ 1 if x > p. Playing σ in G can be viewed as a sequence of coin tosses: s0s0 a b x ≤ p x > p y ≤ q a y > q a y ≤ q b y > q b … … s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 s1s1 toss x [0,1] toss y [0,1]toss x [0,1] [ p = σ(s 1 )(a) ] [ q a = δ(s 1,a)(s 1 ) ]s’ [ q b = δ(s 1,b)(s 1 ) ]s’’’

60
Randomness for free in strategies Given an infinite sequence x=(x n ) n≥0 [0,1] ω, define for all s 0, s 1, …, s n : σ x is a pure and observation-based strategy ! σ x plays like σ, assuming that the result of the coin tosses is the sequence x. The value of σ is the « average of the outcome » of the strategies σ x.

61
Randomness for free in strategies Assume x=(x n ) n≥0 and y=(y n ) n≥0 are fixed. Then σ determines a unique outcome… which is the outcome of some pure strategy ! s0s0 a b x 0 ≤ p 0 x 0 > p 0 y 0 ≤ q 0a y 0 > q 0a y 0 ≤ q 0b y 0 > q 0b s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 s1s1 x 1 > p 1 y 1 > q 1b

62
Randomness for free in strategies s0s0 a b x 0 ≤ p x 0 > p y 0 ≤ q a y 0 > q a y 0 ≤ q b y 0 > q b … … s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 s1s1 Assume x=(x n ) n≥0 and y=(y n ) n≥0 are fixed. Let where

63
Randomness for free in strategies s0s0 a b x 0 ≤ p x 0 > p y 0 ≤ q a y 0 > q a y 0 ≤ q b y 0 > q b … … s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 s1s1 Assume x=(x n ) n≥0 and y=(y n ) n≥0 are fixed. Let where Theorem:

64
Randomness for free in strategies s0s0 a b x 0 ≤ p x 0 > p y 0 ≤ q a y 0 > q a y 0 ≤ q b y 0 > q b … … s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 s1s1 Assume x=(x n ) n≥0 and y=(y n ) n≥0 are fixed. Let where Playing σ in G is equivalent to choosing (x n ) and (y n ) uniformly at random, and then producing

65
Randomness for free in strategies s0s0 a b x 0 ≤ p x 0 > p y 0 ≤ q a y 0 > q a y 0 ≤ q b y 0 > q b … … s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 s1s1 Assume x=(x n ) n≥0 and y=(y n ) n≥0 are fixed. Let where Playing σ in G is equivalent to choosing (x n ) and (y n ) uniformly at random, and then producing The random sequences (x n ) and (y n ) can be generated separately, and independently ! σ x is a pure strategy ! pure strategy

66
Proof The value of σ is the « average of the outcome » of the strategies σ x.

67
Proof The value of σ is the « average of the outcome » of the strategies σ x. By Fubini’s theorem…

68
Proof The value of σ is the « average of the outcome » of the strategies σ x. Pure and randomized strategies are equally powerful in POMDPs !

69
Randomness for free In transition function: In strategies:

70
Outline Randomness in game structure - for free with complete-observation, concurrent - for free with one-sided, turn-based Randomness in strategies - for free in (PO)MDP Corollary: undecidability results

71
Corollary Almost-sure coBüchi (and positive Büchi) with randomized (or pure) strategies is undecidable for POMDPs. Using [BaierBertrandGrößer’08]: Using randomness for free in transition functions : Almost-sure coBüchi (and positive Büchi) is undecidable for deterministic turn-based one-sided complete observation games.

72
Thank you ! Questions ?

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google