Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stochastic tree search and stochastic games

Similar presentations


Presentation on theme: "Stochastic tree search and stochastic games"— Presentation transcript:

1 Stochastic tree search and stochastic games

2 Monte Carlo Tree Search
Minimax search fails for games with deep trees, large branching factor, and no simple heuristics Go: branching factor 361 (19x19 board)

3 Monte Carlo Tree Search
Instead of depth-limited search with an evaluation function, use randomized simulations For each simulation: Traverse the tree using a tree policy (trading off exploration and exploitation) until a leaf node is reached Run a rollout using a default policy (e.g., random moves) until a terminal state is reached Update the value estimates of nodes along the path to reflect the win percentage of simulations passing through that node C. Browne et al., A survey of Monte Carlo Tree Search Methods, 2012

4 AlphaGo

5 AlphaGo Deep convolutional neural networks
Treat the Go board as an image Powerful function approximation machinery Can be trained to predict distribution over possible moves (policy) or expected value of position D. Silver et al., Mastering the Game of Go with Deep Neural Networks and Tree Search, Nature 529, January 2016

6 AlphaGo SL policy network RL policy network
Idea: perform supervised learning (SL) to predict human moves Given state s, predict probability distribution over moves a, P(a|s) Trained on 30M positions, 57% accuracy on predicting human moves Also train a smaller, faster rollout policy network (24% accurate) RL policy network Idea: fine-tune policy network using reinforcement learning (RL) Initialize RL network to SL network Play two snapshots of the network against each other, update parameters to maximize expected final outcome RL network wins against SL network 80% of the time, wins against open-source Pachi Go program 85% of the time D. Silver et al., Mastering the Game of Go with Deep Neural Networks and Tree Search, Nature 529, January 2016

7 AlphaGo SL policy network RL policy network Value network
Idea: train network for position evaluation Given state s, estimate v(s), expected outcome of play starting with position s and following the learned policy for both players Train network by minimizing mean squared error between actual and predicted outcome Trained on 30M positions sampled from different self-play games D. Silver et al., Mastering the Game of Go with Deep Neural Networks and Tree Search, Nature 529, January 2016

8 AlphaGo D. Silver et al., Mastering the Game of Go with Deep Neural Networks and Tree Search, Nature 529, January 2016

9 AlphaGo Monte Carlo Tree Search
Each edge in the search tree maintains prior probabilities P(s,a), counts N(s,a), action values Q(s,a) P(s,a) comes from SL policy network Tree traversal policy selects actions that maximize Q value plus exploration bonus (proportional to P but inversely proportional to N) An expanded leaf node gets a value estimate that is a combination of value network estimate and outcome of simulated game using rollout network At the end of each simulation, Q values are updated to the average of values of all simulations passing through that edge D. Silver et al., Mastering the Game of Go with Deep Neural Networks and Tree Search, Nature 529, January 2016

10 AlphaGo Monte Carlo Tree Search
D. Silver et al., Mastering the Game of Go with Deep Neural Networks and Tree Search, Nature 529, January 2016

11 AlphaGo D. Silver et al., Mastering the Game of Go with Deep Neural Networks and Tree Search, Nature 529, January 2016

12 Stochastic games How to incorporate dice throwing into the game tree?

13 Stochastic games

14 Stochastic games Expectiminimax: for chance nodes, sum values of successor states weighted by the probability of each successor Value(node) = Utility(node) if node is terminal maxaction Value(Succ(node, action)) if type = MAX minaction Value(Succ(node, action)) if type = MIN sumaction P(Succ(node, action)) * Value(Succ(node, action)) if type = CHANCE

15 Stochastic games Expectiminimax: for chance nodes, sum values of successor states weighted by the probability of each successor Nasty branching factor, defining evaluation functions and pruning algorithms more difficult Monte Carlo simulation: when you get to a chance node, simulate a large number of games with random dice rolls and use win percentage as evaluation function Can work well for games like Backgammon

16 Stochastic games of imperfect information
States are grouped into information sets for each player Source

17 Stochastic games of imperfect information
Simple Monte Carlo approach: run multiple simulations with random cards pretending the game is fully observable “Averaging over clairvoyance” Problem: this strategy does not account for bluffing, information gathering, etc.

18 Game AI: Origins Minimax algorithm: Ernst Zermelo, 1912
Chess playing with evaluation function, quiescence search, selective search: Claude Shannon, 1949 (paper) Alpha-beta search: John McCarthy, 1956 Checkers program that learns its own evaluation function by playing against itself: Arthur Samuel, 1956

19 Game AI: State of the art
Computers are better than humans: Checkers: solved in 2007 Chess: State-of-the-art search-based systems now better than humans Deep learning machine teaches itself chess in 72 hours, plays at International Master Level (arXiv, September 2015) Computers are competitive with top human players: Backgammon: TD-Gammon system (1992) used reinforcement learning to learn a good evaluation function Bridge: top systems use Monte Carlo simulation and alpha-beta search Go: computers were not considered competitive until AlphaGo in 2016

20 Game AI: State of the art
Computers are not competitive with top human players: Poker Heads-up limit hold’em poker has been solved (Science, Jan. 2015) Simplest variant played competitively by humans Smaller number of states than checkers, but partial observability makes it difficult Essentially weakly solved = cannot be beaten with statistical significance in a lifetime of playing Huge increase in difficulty from limit to no-limit poker, but AI has made progress

21 See also:

22 UIUC robot (2009)


Download ppt "Stochastic tree search and stochastic games"

Similar presentations


Ads by Google