Download presentation

Presentation is loading. Please wait.

Published byEaston Levers Modified over 2 years ago

1
The Big Match in small space Kristoffer Arnsfelt Hansen Rasmus Ibsen-Jensen Michal Koucký

2
Overview Playing the Big Match in small spacePlaying the Big Match in small space Open problem: Playing CMPGs in small spaceOpen problem: Playing CMPGs in small space 2 of 17 Next slide

3
The Big Match Gillette ‘57 01 10 10 3 of 17 Next slide

4
The Big Match Gillette ‘57 01 10 10 3 of 17

5
The Big Match Gillette ‘57 01 10 0 10 3 of 17

6
The Big Match Gillette ‘57 01 10 10 0 3 of 17

7
The Big Match Gillette ‘57 01 10 10 0 3 of 17

8
The Big Match Gillette ‘57 01 10 10 0 3 of 17

9
The Big Match Gillette ‘57 01 10 10 0, 1 3 of 17

10
The Big Match Gillette ‘57 01 10 10 0, 1 3 of 17

11
The Big Match Gillette ‘57 01 10 10 0, 1 3 of 17

12
The Big Match Gillette ‘57 01 10 10 0, 1 3 of 17

13
The Big Match Gillette ‘57 01 10 10 0, 1, 1 3 of 17

14
The Big Match Gillette ‘57 01 10 10 0, 1, 1 3 of 17

15
The Big Match Gillette ‘57 01 10 10 0, 1, 1 3 of 17

16
The Big Match Gillette ‘57 01 10 10 0, 1, 1 3 of 17

17
The Big Match Gillette ‘57 01 10 10 0, 1, 1, 1 3 of 17

18
Outcome Column player gives lim (either sup or inf) avg. of nr.s written down to row playerColumn player gives lim (either sup or inf) avg. of nr.s written down to row player 4 of 17 Next slide

19
Value Each player can ensure ½ (in limit)Each player can ensure ½ (in limit) Blackwell and Ferguson ’68Blackwell and Ferguson ’68 Column player has simple optimal strategyColumn player has simple optimal strategy Play uniformly at randomPlay uniformly at random Row players strategy is complicated and not optimalRow players strategy is complicated and not optimal Blackwell and Ferguson ’68Blackwell and Ferguson ’68 5 of 17 Next slide

20
The Big Match Gillette ‘57 01 10 10 ½ ½ x 1-x 6 of 17 Next slide

21
Worthless strategies for row player Worthless strategies = ensures outcome 0Worthless strategies = ensures outcome 0 Markov (and stationary) strategiesMarkov (and stationary) strategies Markov = depends only on length of historyMarkov = depends only on length of history Blackwell and Ferguson ’68Blackwell and Ferguson ’68 Finite memory strategiesFinite memory strategies Sorin ‘02Sorin ‘02 Markov strategies ext. w. deterministic-update finite memoryMarkov strategies ext. w. deterministic-update finite memory Deterministic-update: One history ⇒ one memory stateDeterministic-update: One history ⇒ one memory state Hansen, I-J, KouckýHansen, I-J, Koucký 7 of 17 Next slide

22
Memory based strategies Memories M ⊆ {0,1} * Strategy σ a : M × S → Δ (A) σ a : M × S → Δ (A) σ u : M × S × A × A → Δ (M) σ u : M × S × A × A → Δ (M) In round T, memory m T, state s T : Pr[a T ] is assigned by σ a (m T, s T ) Pr[a T ] is assigned by σ a (m T, s T ) Play goes to s T+1, other player used b T : Pr[m T+1 ] is assigned by σ u (m T, s T+1, a T, b T ) Pr[m T+1 ] is assigned by σ u (m T, s T+1, a T, b T ) 8 of 17 Next slide

23
Asymptotic memory usage MDP w. states in uses log f(T) space whp. if only states numbered below f(T) has been visited before round T whp. for all T, no matter the choices of the playerMDP w. states in uses log f(T) space whp. if only states numbered below f(T) has been visited before round T whp. for all T, no matter the choices of the player Strategy uses f(T) space whp. if MDP defined from it on the memory states for the other player uses f(T) space whp.Strategy uses f(T) space whp. if MDP defined from it on the memory states for the other player uses f(T) space whp. 9 of 17 Next slide

24
Results ε -optimal strategy using O(log T) space ε -optimal strategy using O(log T) space Both inf and supBoth inf and sup Blackwell and Ferguson ’67Blackwell and Ferguson ’67 ε -optimal strategy using O(S(T)) space whp. ε -optimal strategy using O(S(T)) space whp. For any given increasing unbounded function SFor any given increasing unbounded function S Only supOnly sup Hansen, I-J, KouckýHansen, I-J, Koucký ε -optimal strategy using O(log log (T)) space whp. ε -optimal strategy using O(log log (T)) space whp. Both inf and supBoth inf and sup Hansen, I-J, KouckýHansen, I-J, Koucký 10 of 17 Next slide

25
Simplified strategy construction Rounds 11 of 17 Next slide

26
Simplified strategy construction Split into epochs, s.t. |epoch i| = S(i)Split into epochs, s.t. |epoch i| = S(i) Rounds S(1)S(2)S(3) 11 of 17

27
Simplified strategy construction Sample i times in epoch i uniformly at randomSample i times in epoch i uniformly at random Rounds S(1)S(2)S(3) 11 of 17

28
Simplified strategy construction Play like classic ε -optimal strategy on samples and otherwise play top rowPlay like classic ε -optimal strategy on samples and otherwise play top row Rounds S(1)S(2)S(3) 11 of 17

29
Simplified strategy construction Strategy is ε -optimal for sup avg. (inf avg. if S(T) = 2 T ) and uses space log S -1 (T) whp.Strategy is ε -optimal for sup avg. (inf avg. if S(T) = 2 T ) and uses space log S -1 (T) whp. Rounds S(1)S(2)S(3) Real strategy: more messy Real strategy: more messy 11 of 17

30
Extensions Concurrent mean-payoff game w. 1 state that can be leftConcurrent mean-payoff game w. 1 state that can be left Hansen, I-J, KouckýHansen, I-J, Koucký Result is similarResult is similar Any concurrent mean-payoff game (CMPG)Any concurrent mean-payoff game (CMPG) OpenOpen 12 of 17 Next slide

31
Overview Playing the Big Match in small spacePlaying the Big Match in small space Open problem: Playing CMPGs in small spaceOpen problem: Playing CMPGs in small space 13 of 17 Next slide

32
Extension to CMPGs Known: O(log T)Known: O(log T) Mertens and Neyman ‘81Mertens and Neyman ‘81 14 of 17 Next slide

33
Known strategy Mertens and Neyman ‘81 Rounds 15 of 17 Next slide

34
Known strategy Mertens and Neyman ‘81 Split into epochs, s.t. |epoch i| = L(i)Split into epochs, s.t. |epoch i| = L(i) Rounds L(1)L(2)L(3) 15 of 17

35
Known strategy Mertens and Neyman ‘81 For each epoch i:For each epoch i: LetLet Find discount factor λ (s(i))Find discount factor λ (s(i)) Play optimally in game with discount factor λ (s(i)) in epoch i+1Play optimally in game with discount factor λ (s(i)) in epoch i+1 Rounds L(1)L(2)L(3) λ (s(0)) λ (s(1)) λ (s(2)) λ (s(3)) 15 of 17

36
Problems Remembering enough for s(i) and #roundsRemembering enough for s(i) and #rounds Requires O(log T) memoryRequires O(log T) memory Likely: Can use approximationLikely: Can use approximation Find using samplingFind using sampling The function L(i) can only increase linearly in iThe function L(i) can only increase linearly in i Similar result req. fast growthSimilar result req. fast growth UnsureUnsure 16 of 17 Next slide

37
Thanks! 17 of 17 Next slide

Similar presentations

OK

Winning concurrent reachability games requires doubly-exponential patience Michal Koucký IM AS CR, Prague Kristoffer Arnsfelt Hansen, Peter Bro Miltersen.

Winning concurrent reachability games requires doubly-exponential patience Michal Koucký IM AS CR, Prague Kristoffer Arnsfelt Hansen, Peter Bro Miltersen.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google