Presentation on theme: "Intelligence for Games and Puzzles1 Backgammon Backgammon is an ancient game of skill with a."— Presentation transcript:
Intelligence for Games and Puzzles1 Backgammon Backgammon is an ancient game of skill with a chance element: players throw two dice to determine the moves that are possible, then choose among them. Each die roll permits the player to move one man that many points. Both rolls must be used if possible. The same man may be moved twice. 2 rolls the same 4 moves Blacks direction of travel Blacks inner table Whites inner table Whites direction of travel A man may not land on a block - a point occupied by two or more enemy men. If a man lands on a point occupied by one enemy - hits a blot - the enemy is removed. Removed men must start again, from an imaginary off-board point at the players start point. No other move while men are off. Players race to a) get all their men into their inner table, then b) move them to any imaginary point beyond their end point. Max 15 men per point
Intelligence for Games and Puzzles2 Gambling, Sizes of win Backgammon is played for stakes (matchsticks? money?) and for championships. If a player wins before his opponent has removed any men, he wins double. If in addition the opponent still has men in his first quadrant, he wins triple. There is a doubling cube, bearing the numbers 2,4,8,16,32,64. A player may double - offer to continue the game for twice the stake. If the opponent refuses, he resigns the game and loses the current stake. If he accepts, only he may offer the next double.
Intelligence for Games and Puzzles3 Search complexity of backgammon On each turn, the two dice can be rolled in 21 different ways. Typically there are around 20 moves possible for any result of the dice. Therefore the branching factor is approx 400. It is estimated there are around possible arrangements of the men. Less than chess, go; comparable with checkers, othello; still much too large for exhaustive analysis.
Intelligence for Games and Puzzles4 Berliners BKG program In 1979 Berliners BKG9.8 won a 7-game exhibition match against Luigi Villa, the then World Champion. Caveats: 7 games is too short a series to be conclusive, since luck plays a part Villa had reason not to take the comic-robot program very seriously It was an exhibition match, not a championship BKG was a pure no-search Type-B program, it evaluated moves at 1 ply. Its hand-crafted evaluation method was called SNAC Smoothness (as opposed to generalised form of Horizon Effect) Non-linearity (importance of features inter-dependent, not just additive) Application Coefficients (program computed higher-level features, the evaluation function was not expressed directly in terms of positions of men)
Intelligence for Games and Puzzles5 Tesauros TD-GAMMON TD-Gammon was (primarily) a connectionist system. It learnt move decisions by self-play, it had hand-crafted doubling algorithm. Its two main features were 1.Use of the Method of Temporal Differences to solve the (temporal) credit assignment problem 2.Use of Rollout analysis to generate reliable statistics TD-Gammon did (in mid-1990s) play the then champion, and other world-class players, in substantial numbers of games, and showed near parity with them.
Intelligence for Games and Puzzles6 MLP (Multi Layer Perceptron) InputsHidden layerOutputs all inputs connect to all hidden all hidden connect to all outputs win x2 win x1 lose x1 lose x2
Intelligence for Games and Puzzles7 Temporal Difference learning TD learning applies when a series of choices leads to an eventual outcome. Like reinforcement learning, over many trials, those choices that lead to good outcomes should be preferred those choices that lead to bad outcome should be deprecated The essence of TD learning is that the valuation of a situation before a choice is made should become more similar to the valuation of the situation after the choice is made Situations just before a win (or loss) should be assessed virtually as if they themselves were a win (loss) - and so on back through all prior choices.
Intelligence for Games and Puzzles8 Learning from self play vs Tuition Human experts are good, but fallible. They get tired, make mistakes, cannot communicate everything. Community opinions change over time Indeed TD-Gammon rollouts have helped change expert opinion! Very large numbers of games can be self-played by machine. 200k training games and TD-Gammon showed feature discovery Successive versions with more training showed continuing improvements vis à vis top class players Danger of self-play: random strategy at start may be very bad
Intelligence for Games and Puzzles9 Early learning A danger in learning by self play is that the random strategy at the beginning of learning may be very bad. Games may last 1000s of moves, rather than the typical Nothing of value might then be learnt. TD-Gammon however did learn some elementary tactics and strategy from its early games. The trend towards the end of the game, forced by the generally forward movement of men, meant that even random games were brief. By scoring every possible move, choosing the one with highest expected outcome, it learnt early on some basic concepts: hit the opponent; play safe; build blocks Later learning could build on this to discover interesting evaluation features
Intelligence for Games and Puzzles10 Judgment and Calculation Experts commenting on TD-Gammons play noted contrast with chess programs: Chess programs excel at laborious calculation but show poor judgment TD-Gammon showed good positional judgment Pure TD-learning from experience, using only raw coding of positions, enabled TD-Gammon to match the performance of Tesauros earlier NeuroGammon. Adding further inputs - giving access to precomputed tables of strength of blockade, precomputed probability of being hit - brought TD-Gammon to near parity with world championship level.
Intelligence for Games and Puzzles11 Rollout Analysis TD-Gammon also performs the most trusted rollout analysis operation: Where commentators & analysts look over championship games, they sometimes disagree about the merits of positions resulting from alternative possible moves. A rollout can be used to determine which is better: from each resulting position, generate and play out 10,000 random dice sequences, pick the move that gives the best result Even though the absolute error in the original scoring might be large, the relative error would not be - faults in the evaluations of related positions are highly correlated.
Intelligence for Games and Puzzles12 Rare success for TD learning TD learning was very successfully applied in TD-Gammon; but it has not proved so successful in other domains. Possible reasons for its success at backgammon: because of the nature of the rules of backgammon, even a random strategy gives sequences brief enough to be learned from. Go could go on for ages. the randomness of the dice causes noise, meaning that the same strategy will on different occasions explore different parts of the search space. The learner will not play deterministically and get stuck on a poor strategy. (Noise Injection has been found helpful in other applications) there are simple linear concepts that can be learned early (blots are bad, blocks are good, bear off asap), and later learning can bootstrap from these.
Intelligence for Games and Puzzles13 References Berliner, H Performance Note: Backgammon computer program beats world champion. Artificial Intelligence v.14 pp Tesauro, G Temporal Difference learning and TD-Gammon. CACM v38:3 pp58-68