Presentation is loading. Please wait.

Presentation is loading. Please wait.

Energy and Mean-Payoff Parity Markov Decision Processes Laurent Doyen LSV, ENS Cachan & CNRS Krishnendu Chatterjee IST Austria MFCS 2011.

Similar presentations


Presentation on theme: "Energy and Mean-Payoff Parity Markov Decision Processes Laurent Doyen LSV, ENS Cachan & CNRS Krishnendu Chatterjee IST Austria MFCS 2011."— Presentation transcript:

1 Energy and Mean-Payoff Parity Markov Decision Processes Laurent Doyen LSV, ENS Cachan & CNRS Krishnendu Chatterjee IST Austria MFCS 2011

2 Games for system analysis Verification: check if a given system is correct  reduces to graph searching System input output Spec: φ( input,output ) Environment

3 Games for system analysis ? input output Spec: φ( input,output ) Environment Verification: check if a given system is correct  reduces to graph searching Synthesis : construct a correct system  reduces to game solving – finding a winning strategy

4 Games for system analysis Spec: φ( input,output ) Verification: check if a given system is correct  reduces to graph searching Synthesis : construct a correct system  reduces to game solving – finding a winning strategy This talk: environment is abstracted as a stochastic process ? input output Environment input output = Markov decision process (MDP) ?

5 Markov decision process

6 Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

7 Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

8 Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

9 Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

10 Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

11 Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

12 Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

13 Markov decision process (MDP) Nondeterministic (player 1) Probabilistic Strategy (policy) = recipe to extend the play prefix

14 Markov decision process (MDP) Nondeterministic (player 1) Probabilistic Strategy (policy) = recipe to extend the play prefix

15 Markov decision process (MDP) Nondeterministic (player 1) Probabilistic (player 2) Strategy (policy) = recipe to extend the play prefix weak

16 Objective Strategy is almost-sure winning, if with probability 1: - Büchi: visit accepting states infinitely often. - Parity: least priority visited infinitely often is even. Fix a strategy  (infinite) Markov chain

17 Decision problem Given an MDP, decide whether there exists an almost-sure winning strategy for parity objective.

18 Decision problem End-component = set of states U s.t. if then some successor of q is in U if then all successors of q are in U strongly connected U Given an MDP, decide whether there exists an almost-sure winning strategy for parity objective.

19 Decision problem End-component is good if least priority is even Almost-sure reachability to good end-components in PTIME strongly connected U Given an MDP, decide whether there exists an almost-sure winning strategy for parity objective. End-component = set of states U s.t. if then some successor of q is in U if then all successors of q are in U strongly connected

20 Given an MDP, decide whether there exists an almost-sure winning strategy for parity objective. Decision problem End-component is good if least priority is even Almost-sure reachability to good end-components in PTIME strongly connected End-component = set of states U s.t. if then some successor of q is in U if then all successors of q are in U strongly connected

21 Parity condition Qualitative ω-regular specifications (reactivity, liveness,…) Objectives

22 Parity conditionEnergy condition QualitativeQuantitative ω-regular specifications (reactivity, liveness,…) Resource-constrained specifications Objectives

23 Energy objective Positive and negative weights (encoded in binary)

24 Energy objective Energy level: 20, 21, 11, 1, 2,… (sum of weights) Positive and negative weights (encoded in binary)

25 Energy objective Positive and negative weights Energy level: 20, 21, 11, 1, 2,… (sum of weights) Initial credit

26 Energy objective Positive and negative weights Energy level: 20, 21, 11, 1, 2,… (sum of weights) Initial credit 20 A play is winning if the energy level is always nonnegative. “Never exhaust the resource (memory, battery, …)”

27 Decision problem Given a weighted MDP, decide whether there exist an initial credit c 0 and an almost-sure winning strategy to maintain the energy level always nonnegative.

28 Decision problem Equivalent to a two-player game: If player 2 can force a negative energy level on a path, then the path is finite and has positive probability in MDP player 2 state Given a weighted MDP, decide whether there exist an initial credit c 0 and an almost-sure winning strategy to maintain the energy level always nonnegative.

29 Decision problem Equivalent to a two-player game: If player 2 can force a negative energy level on a path, then the path is finite and has positive probability in MDP player 2 state

30 Energy Parity MDP

31 Objectives Energy parity MDP QualitativeQuantitative Mixed qualitative-quantitative ω-regular specifications (reactivity, liveness,…) Resource-constrained specifications Parity condition Energy condition

32 Strategy is almost-sure winning with initial credit c 0, if with probability 1: energy condition and parity condition hold Energy parity MDP “never exhaust the resource” and “always eventually do something useful”

33 Algorithm for Energy Büchi MDP For parity, probabilistic player is my friend For energy, probabilistic player is my opponent

34 Algorithm for Energy Büchi MDP Replace each probabilistic state by the gadget: For parity, probabilistic player is my friend For energy, probabilistic player is my opponent

35 Algorithm for Energy Büchi MDP Reduction of energy Büchi MDP to energy Büchi game

36 Algorithm for Energy Büchi MDP Reduction of energy Büchi MDP to energy Büchi game Player 1 can guess an even priority 2i, and win in the energy Büchi MDP where: Büchi states are 2i-states, and transitions to states with priority <2i are disallowed Reduction of energy parity MDP to energy Büchi MDP

37 Mean-payoff Parity MDP

38 Mean-payoff Mean-payoff value of a play = limit- average of the visited weights Optimal mean-payoff value can be achieved with a memoryless strategy. Decision problem: Given a rational threshold, decide if there exists a strategy for player 1 to ensure mean-payoff value at least with probability 1.

39 Mean-payoff Mean-payoff value of a play = limit- average of the visited weights Optimal mean-payoff value can be achieved with a memoryless strategy. Decision problem: Given a rational threshold, decide if there exists a strategy for player 1 to ensure mean-payoff value at least with probability 1.

40 Mean-payoff games Memoryless strategy σ ensures nonnegative mean-payoff value iff all cycles are nonnegative in G σ iff memoryless strategy σ is winning in energy game Mean-payoff games with threshold 0 are equivalent to energy games.

41 Mean-payoff vs. Energy EnergyMean-Payoff Games NP  coNP MDP NP  coNP PTIME

42 Mean-payoff parity MDPs Find a strategy which ensures with probability 1: - parity condition, and - mean-payoff value ≥ Gadget reduction does not work: Player 1 almost- surely wins Player 1 loses

43 Algorithm for mean-payoff parity End-component analysis almost-surely all states of end-component can be visited infinitely often expected mean-payoff value of all states in end-component is same strongly connected End-component is good if - least priority is even - expected mean-payoff value ≥ Almost-sure reachability to good end-component in PTIME

44 Algorithm for mean-payoff parity End-component analysis almost-surely all states of end-component can be visited infinitely often expected mean-payoff value of all states in end-component is same strongly connected End-component is good if - least priority is even - expected mean-payoff value ≥ Almost-sure reachability to even end-component in PTIME

45 MDP Parity MDPsEnergy MDPs Mean-payoff MDPs Energy parity MDPs Mean-payoff parity MDPs QualitativeQuantitative Mixed qualitative-quantitative

46 MDP Parity MDPsEnergy MDPs Mean-payoff MDPs Energy parity MDPs Mean-payoff parity MDPs QualitativeQuantitative Mixed qualitative-quantitative

47 Summary Energy parityMean-Payoff parity Games NP  coNP MDP NP  coNP PTIME Algorithmic complexity

48 Summary Energy parityMean-Payoff parity Games NP  coNP MDP NP  coNP PTIME Strategy complexity Energy parityMean-Payoff parity Gamesn·d·Winfinite MDP2·n·Winfinite Algorithmic complexity

49 Thank you ! Questions ? The end

50

51 References [CdAHS03]A.Chakrabarti, L. de Alfaro, T.A. Henzinger, and M. Stoelinga. Resource interfaces, Proc. of EMSOFT: Embedded Software, LNCS 2855, Springer, pp , 2003 [EM79]A. Ehrenfeucht, and J. Mycielski, Positional Strategies for Mean-Payoff Games, International Journal of Game Theory, vol. 8, pp , 1979 [BFL+08]P. Bouyer, U. Fahrenberg, K.G. Larsen, N. Markey, and J. Srba, Infinite Runs in Weighted Timed Automata with Energy Constraints, Proc. of FORMATS: Formal Modeling and Analysis of Timed Systems, LNCS 5215, Springer, pp , 2008 [CHJ05]K. Chatterjee, T.A. Henzinger, M. Jurdzinski. Mean-payoff Parity Games, Proc. of LICS: Logic in Computer Science, IEEE, pp , 2005.

52 Energy parity

53 Mean-payoff parity

54 Complexity StrategyAlgorithmic Player 1Player 2complexity Energymemoryless NP  coNP Paritymemoryless NP  coNP Energy parityexponentialmemoryless NP  coNP Mean-payoffmemoryless NP  coNP Mean-payoff parity infinitememoryless NP  coNP


Download ppt "Energy and Mean-Payoff Parity Markov Decision Processes Laurent Doyen LSV, ENS Cachan & CNRS Krishnendu Chatterjee IST Austria MFCS 2011."

Similar presentations


Ads by Google