Presentation is loading. Please wait.

Presentation is loading. Please wait.

6/30/00UAI 20001 Regret Minimization in Stochastic Games Shie Mannor and Nahum Shimkin Technion, Israel Institute of Technology Dept. of Electrical Engineering.

Similar presentations


Presentation on theme: "6/30/00UAI 20001 Regret Minimization in Stochastic Games Shie Mannor and Nahum Shimkin Technion, Israel Institute of Technology Dept. of Electrical Engineering."— Presentation transcript:

1 6/30/00UAI Regret Minimization in Stochastic Games Shie Mannor and Nahum Shimkin Technion, Israel Institute of Technology Dept. of Electrical Engineering

2 6/30/00UAI Introduction Modeling of a dynamic decision process as a stochastic game: Non stationarity of the environment Environments are not (necessarily) hostile Looking for the best possible strategy in light of the environment ’ s actions.

3 6/30/00UAI Repeated Matrix Games The sets of single stage strategies P and Q are simplical. Rewards are defined by a reward matrix G: r(p,q)=pGq Reward criteria - average reward Need not converge – stationarity is not assumed

4 6/30/00UAI Regret for Repeated Matrix Games  Suppose by time t, average reward is, opponent empirical strategy is q t.  The regret is defined as:  A policy is called regret minimizing if:

5 6/30/00UAI Regret minimization for repeated matrix games Such policies do exist (Hannan, 56) A proof using Approachability theory (Blackwell, 56) Also for games with partial observation (Auer et al.,1995 ; Rustichini, 1999)

6 6/30/00UAI Stochastic Games Formal Model: S={1, …,s} state space A=A(s) actions of Regret minimizing player, P1 B=B(s) actions of the “ environment ”, P2 r - reward function, r(s,a,b) P - transition kernel, P(s`|s,a,b) Expected average for pP, qQ is r(p,q) Single state recurrence assumption

7 6/30/00UAI Bayes Reward in Strategy Space  For every stationary strategy qQ, the Bayes reward is defined as:  Problems:  P2 ’ s strategy is not completely observed  P1 ’ s observations may depends on the strategies of both players

8 6/30/00UAI Bayes Reward in State- Action Space Let  sb be the observed frequency of P2 ’ s action b and state s. A natural estimate of q is: The associated Bayes envelope is:

9 6/30/00UAI Approachability Theory  A standard tool in the theory of repeated matrix games (Blackwell, 1956)  For a game with vector reward and average reward  A set is approachable by P1 with a policy  if:  Was extended to recurrent stochastic games (Shimkin and Shwartz, 1993)

10 6/30/00UAI The Convex Bayes Envelope In general BE is not approachable. Define CBE=co(BE), that is where is the lower convex hull of Theorem: CBE is approachable. (val is the value of the game)

11 6/30/00UAI Single Controller Games Theorem: Assume that P2 alone controls the transitions, i.e. then BE itself is approachable.

12 6/30/00UAI An Application to Prediction with Expert Advice  Given a channel and a set of experts  At each time epoch each expert states his prediction of the next symbol and P1 has to choose his prediction,   Then a letter  appears in the channel and P1 receives his prediction reward r(, ) Problem can be formulated as stochastic game, P2 stands for all experts and the channel

13 6/30/00UAI Prediction Example (cont ’ ) Theorem: P1 has a zero regret strategy. 0 (0,0,0) (k-1,k,k) (k,k,k) Expert recommendation 0 r(a,b) r=0

14 6/30/00UAI An example in which BE is not approachable It can be proved that BE for the above game is not approachable r=b S 0 r=b S 1 a=0 a=1 P=0.99 B(0)=B(1)={-1,1}

15 6/30/00UAI Example (cont ’ ) In r*(q) space the envelopes are:

16 6/30/00UAI Open questions Characterization of minimal approachable sets in reward- state-actions space On-line learning schemes for stochastic games with unknown parameters Other ways of formulating optimality with respect to observed state action frequencies

17 6/30/00UAI Conclusions  The problem of regret minimization for stochastic games was considered  The proposed solution concept, CBE, is based on convexification of the Bayes envelope in the natural state action space.  The concept of CBE ensures an average reward that is higher than value when the opponent is sub optimal

18 6/30/00UAI Regret Minimization in Stochastic Games Shie Mannor and Nahum Shimkin Technion, Israel Institute of Technology Dept. of Electrical Engineering

19 6/30/00UAI Approachability Theory Let m(p,q) be the average vector valued reward in a game when P1 and P2 play p and q Define Theorem [Blackwell 56]: A convex set C is approachable if and only if for every qQ Extended to stochastic games (Shimkin and Shwartz, 1993)

20 6/30/00UAI A related Vector Valued Game Define the following vector valued game: If in state s action b is played by P2 and a reward r is gained then the vector valued m t :


Download ppt "6/30/00UAI 20001 Regret Minimization in Stochastic Games Shie Mannor and Nahum Shimkin Technion, Israel Institute of Technology Dept. of Electrical Engineering."

Similar presentations


Ads by Google