 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 2.

Slides:



Advertisements
Similar presentations
Adversarial Search Chapter 6 Section 1 – 4. Types of Games.
Advertisements

Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
Artificial Intelligence Adversarial search Fall 2008 professor: Luigi Ceccaroni.
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2008.
Game Playing Games require different search procedures. Basically they are based on generate and test philosophy. At one end, generator generates entire.
Adversarial Search Chapter 6 Section 1 – 4.
Adversarial Search Chapter 5.
Lecture 12 Last time: CSPs, backtracking, forward checking Today: Game Playing.
Adversarial Search CSE 473 University of Washington.
Adversarial Search Chapter 6.
An Introduction to Artificial Intelligence Lecture VI: Adversarial Search (Games) Ramin Halavati In which we examine problems.
Search Strategies.  Tries – for word searchers, spell checking, spelling corrections  Digital Search Trees – for searching for frequent keys (in text,
Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Artificial Intelligence in Game Design
Mahgul Gulzai Moomal Umer Rabail Hafeez
This time: Outline Game playing The minimax algorithm
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.6: Adversarial Search Fall 2008 Marco Valtorta.
1 search CS 331/531 Dr M M Awais A* Examples:. 2 search CS 331/531 Dr M M Awais 8-Puzzle f(N) = g(N) + h(N)
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2006.
Adversarial Search: Game Playing Reading: Chess paper.
Games & Adversarial Search Chapter 6 Section 1 – 4.
Game Playing: Adversarial Search Chapter 6. Why study games Fun Clear criteria for success Interesting, hard problems which require minimal “initial structure”
Ocober 10, 2012Introduction to Artificial Intelligence Lecture 9: Machine Evolution 1 The Alpha-Beta Procedure Example: max.
Game Playing State-of-the-Art  Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in Used an endgame database defining.
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
CISC 235: Topic 6 Game Trees.
Lecture 5 Note: Some slides and/or pictures are adapted from Lecture slides / Books of Dr Zafar Alvi. Text Book - Aritificial Intelligence Illuminated.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Lecture 6: Game Playing Heshaam Faili University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance.
Game Playing.
1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 3. Rote Learning.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Agents that can play multi-player games. Recall: Single-player, fully-observable, deterministic game agents An agent that plays Peg Solitaire involves.
Chapter 12 Adversarial Search. (c) 2000, 2001 SNU CSE Biointelligence Lab2 Two-Agent Games (1) Idealized Setting  The actions of the agents are interleaved.
Game-playing AIs Part 1 CIS 391 Fall CSE Intro to AI 2 Games: Outline of Unit Part I (this set of slides)  Motivation  Game Trees  Evaluation.
Adversarial Search Chapter 6 Section 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 6 –Adversarial Search Thursday –AIMA, Ch. 6 –More Adversarial Search The “Luke.
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
1 Adversarial Search CS 171/271 (Chapter 6) Some text and images in these slides were drawn from Russel & Norvig’s published material.
Game Playing. Introduction One of the earliest areas in artificial intelligence is game playing. Two-person zero-sum game. Games for which the state space.
CSCI 4310 Lecture 6: Adversarial Tree Search. Book Winston Chapter 6.
GAME PLAYING 1. There were two reasons that games appeared to be a good domain in which to explore machine intelligence: 1.They provide a structured task.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.
Adversarial Search Chapter Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent reply Time limits.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
Adversarial Search 2 (Game Playing)
Adversarial Search and Game Playing Russell and Norvig: Chapter 6 Slides adapted from: robotics.stanford.edu/~latombe/cs121/2004/home.htm Prof: Dekang.
February 25, 2016Introduction to Artificial Intelligence Lecture 10: Two-Player Games II 1 The Alpha-Beta Procedure Can we estimate the efficiency benefit.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module 5 Adversarial Search (Thanks Meinolf Sellman!)
Artificial Intelligence in Game Design Board Games and the MinMax Algorithm.
Adversarial Search Chapter 5 Sections 1 – 4. AI & Expert Systems© Dr. Khalid Kaabneh, AAU Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
ADVERSARIAL SEARCH Chapter 6 Section 1 – 4. OUTLINE Optimal decisions α-β pruning Imperfect, real-time decisions.
Adversarial Search Chapter Two-Agent Games (1) Idealized Setting – The actions of the agents are interleaved. Example – Grid-Space World – Two.
Adversarial Search and Game-Playing
Announcements Homework 1 Full assignment posted..
Last time: search strategies
Adversarial Search Chapter 5.
The Alpha-Beta Procedure
CSE (c) S. Tanimoto, 2001 Search-Introduction
CSE (c) S. Tanimoto, 2007 Search 2: AlphaBeta Pruning
Minimax strategies, alpha beta pruning
Unit II Game Playing.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning
Presentation transcript:

 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 2. Rote Learning

 2003, G.Tecuci, Learning Agents Laboratory 2 Overview Game playing as a performance task Rote learning in game paying Learning a static evaluation function Rote learning issues Recommended reading

 2003, G.Tecuci, Learning Agents Laboratory 3 Rote Learning Rote learning consists of memorizing the solutions of the solved problems so that the system needs not to solve them again: During subsequent computations of f(X1,... Xn), the performance element can simply retrieve (Y1,..., Yp) from memory rather than recomputing it.

 2003, G.Tecuci, Learning Agents Laboratory 4 Issues in the design of rote learning systems Rote learning requires useful organization of the memory so that the retrieval of the desired information will be very fast. The information stored at one time should still be valid later. The cost of storing and retrieving the memorized information should be smaller than the cost of recomputing it. Memory organizations Stability of the environment Store-versus-compute trade-off

 2003, G.Tecuci, Learning Agents Laboratory 5 Overview Game playing as a performance task Rote learning in game paying Learning a static evaluation function Rote learning issues Recommended reading

 2003, G.Tecuci, Learning Agents Laboratory 6 Game playing as a performance task: Checkers There are two players (Grey and White), each having 12 men. They alternatively move one of their men. A man could be moved forward diagonally from one black square to another, or it could jump over an opponent's man, if the square behind it is vacant. In such a case the opponent's man is captured. Any number of men could be jumped (and captured) if the square behind each is vacant. If a man reaches the opponent's last row, it is transformed into a king by placing another man on top of it. The king could move both forward and backward (as opposed to the men which could move only forward). The winning player is the one who succeeds in blocking all the men of its opponent (so that they cannot move) or succeeds in capturing all of them.

 2003, G.Tecuci, Learning Agents Laboratory 7 Each path from the root node to a terminal node gives a different complete play of the game. For instance, Grey has seven possible moves at the start of the game, namely: 9-13, 9-14, 10-14, 10-15, 11-15, 11-16, and White has seven possible responses: 21-17, 22-17, 22-18, 23-18, 23-19; 24-19, Some of these responses are better, while others are worse. For instance, if Grey opens 9-14 and White plays 21-17, then Grey can jump over White's man and capture it. Game tree search All the possible plays of a game could be represented as a tree. The root node is the initial state, in which it is the first player's turn to move. The successors of the initial state are the states he can reach in one move, their successors are the states resulting from the other player's possible replies, and so on. Terminal states are those representing a win for the Grey player, a loss for the the Grey player, or a draw.

 2003, G.Tecuci, Learning Agents Laboratory 8 The minimax procedure Minimax is a procedure for assigning values to the nodes in a game tree. The value of a node expresses how good that node is for the first player (called the Max player) and how bad it is for the second player (called the Min player). Therefore, the Max player will always choose to move to the node that has the maximum value among the possible successors of the current node. Similarly, the Min player will always choose to move to the node that has the minimum value among the possible successors of the current node. In the case of checkers, we consider that Grey is the Max player and White is the Min player. Given the values of the terminal nodes, the values of the nonterminal nodes are computed as follows: - the value of a node where it is the Grey player's turn to move is the maximum of the values of its successors (because Grey tries to maximize its outcome); - the value of a node where it is the White player's turn to move is the minimum of the values of its successors (because White tries to minimize the outcome of Grey).

 2003, G.Tecuci, Learning Agents Laboratory 9 Problem Consider the following game tree in which the numbers associated with the leaves represent how good they are from the point of view of the Maximizing player: What move should be chosen by the Max player, and what should be the response of the Min player, assuming that both are using the mini-max procedure?

 2003, G.Tecuci, Learning Agents Laboratory 10 Solution Max will move to c, Min will respond by moving to f, and Max will move to m.

 2003, G.Tecuci, Learning Agents Laboratory 11 A complete game tree for checkers has been estimated as having nonterminal nodes. If one assumes that these nodes could be generated at a rate of 3 billion per second, the generation of the whole tree would still require around centuries ! Checkers is far simpler than chess which, in turn, is generally far simpler than business competitions or military games. Size of the search space Searching a partial game tree The tree of possibilities is far too large to be fully generated and searched backward from the terminal nodes, for an optimal move.

 2003, G.Tecuci, Learning Agents Laboratory 12 Heuristic function for board position evaluation: w 1. f 1 + w 2. f 2 + w 3. f 3 + … where w i are real-valued weights and f i are numeric board features (e.g. the number of white pieces, the number of white kings). Searching a partial game tree

 2003, G.Tecuci, Learning Agents Laboratory 13 What is the justification for this approach? The idea is that the static evaluation function produces more accurate results when the evaluated nodes are closer to a goal node.

 2003, G.Tecuci, Learning Agents Laboratory 14 Overview Game playing as a performance task Rote learning in game paying Learning a static evaluation function Rote learning issues Recommended reading

 2003, G.Tecuci, Learning Agents Laboratory 15 An illustration of rote learning in game playing Samuel's checkers player Memorize (A, 8) Estimate value of A

 2003, G.Tecuci, Learning Agents Laboratory 16 Samuel's program was provided with procedures for playing checkers correctly. At each turn it chooses its move by conducting a minimax game-tree search (in fact it employs an alpha-beta searching method which is an optimized version of mini-max). Because of the huge search space of checkers, the program searches only a few moves and countermoves into the future and then applies a static evaluation function to the leaves of the tree, in order to estimate which side is winning. The program then chooses the move that leads to the position that was estimated of being the best. Suppose that at board position A it is the program's turn to move. The program builds the search tree three moves ahead. Then it applies a static evaluation function to estimate the value of the position corresponding to each leaf. These values are then backed up by using the minimax procedure. Thus, the best move for the program is the one that leads to position B. The program expects that the opponent will countermove to C, to which the program can reply with D. The static evaluation function used is: value =  wifi Where fi are numeric board features and wi are real-valued weights. An example of a board feature is the relative exchange advantage of the player whose turn it is to move. This feature is defined as follows: EXCH = Tcurrent - Tprevious whereTcurrent =the total number of squares into which the player to move may advance a piece, and in doing so forces an exchange Tprevious is the corresponding number for the previous move by the opposing player. Other considered features are MOB (total mobility), GUARD (back-row control) and KCENT (king center control). Each such feature has an associated weight which estimates its contribution to the value of the current board position.

 2003, G.Tecuci, Learning Agents Laboratory 17 Improving the performance of the checkers player Question Using the memorized value (A, 8) is improving the performance. Why? Current position E A (A, 8)

 2003, G.Tecuci, Learning Agents Laboratory 18 Improving the look-ahead power by rote learning Current position (A, 8) 8 Answer: This makes the program more efficient for two reasons: it does not have to compute the value of A with the static evaluation function; the memorized value of A is more accurate than the static value of A, because it is based on a look-ahead search.

 2003, G.Tecuci, Learning Agents Laboratory 19 One way to improve the performance of a game-tree search is to search further into the search tree and thus better approximate a full search of the tree. This is known as improving the look-ahead power of the program. The same effect may be obtained by using rote learning. The program saves every board position encountered during play, along with its backed-up value. In the above case, it will save the description of the board position A and its backed-up value of 8 as a pair (A, 8). When position A is encountered in subsequent games, its evaluation score is retrieved from memory rather than recomputed. This makes the program more efficient for the following two reasons: it does not have to compute the value of A with the static evaluation function; the memorized value of A is more accurate than the static value of A, because it is based on a look-ahead search. Thus, the look-ahead power of the program is improved. In the above figure, the program is considering which move to make at position E. It searches three moves ahead and then applies the static evaluation function. At position A, however, it is able to retrieve the memorized value based on the previous search to position D. As more and more positions are memorized, the effective search depth improves from its original value of 3 moves, up to 6, then to 9, and so on. In conclusion, rote learning converts a computation (tree search) into a retrieval from memory.

 2003, G.Tecuci, Learning Agents Laboratory 20 The program developed by Samuel was trained by playing against itself, by playing against people and by following book games. After training, the memory contained roughly 53,000 positions, and the program became "rather better-than-average novice, but definitely not... an expert" (Samuel, 1959). Samuel estimated that his program would need to memorize about one million positions to approximate a master level of checkers play. Samuel's experiments demonstrated that significant and measurable learning can result from rote learning alone. By retrieving the stored results of extensive computations, the program can proceed deeper in its reasoning. The price is storage space, access time, and effort in organizing the stored knowledge. Samuel’s results and conclusion

 2003, G.Tecuci, Learning Agents Laboratory 21 Overview Game playing as a performance task Rote learning in game paying Learning a static evaluation function Rote learning issues Recommended reading

 2003, G.Tecuci, Learning Agents Laboratory 22 value =  w i f i Learning a polynomial evaluation function What are the main problems to be solved? a) Discovering which features f i to use in the function b) Learning the weights of the features to obtain an accurate value for the board position

 2003, G.Tecuci, Learning Agents Laboratory 23 The learning procedure is to compare at each move the value of the static evaluation function corresponding to the current board position with a performance standard that provides a more accurate estimate of that value. The difference between these two estimates controls the adjustment of the weights in the evaluation function so as to better approximate the performance standard. Learning the weights of the features Reinforcement learning

 2003, G.Tecuci, Learning Agents Laboratory 24 Performance standards What performance standards could be used? One performance standard could be obtained by conducting a deeper minimax search into future board positions, applying the evaluation function to tip board positions and backing up these values. The idea is that the static evaluation function produces more accurate results when the evaluated nodes are closer to a goal node.

 2003, G.Tecuci, Learning Agents Laboratory 25 Performance standards: using “f” itself How could this be implemented? One considers an iterative procedure of updating “f.” The performance standard for a certain position B is f(successor(B)). That is, one adjusts the weights such that to reduce the difference between f(successor(B)) and f(B). B f(B) f(successor(B))

 2003, G.Tecuci, Learning Agents Laboratory 26 Another possible performance standard could be obtained from "book games" played between two human experts. In such a case, the static evaluation function should be modified so that the value of the board position corresponding to the move indicated by the book is higher than the values of the positions corresponding to the other possible moves. Performance standards What other performance standards could be used?

 2003, G.Tecuci, Learning Agents Laboratory 27 The problem of new terms: How could a learning system discover the appropriate terms for representing the knowledge to be learned? Discovering features to use in evaluation function A partial solution is term selection: provide a list of terms from which the most relevant terms are to be chosen. Samuel started with 38 terms, out of which only 16 are used in the static evaluation function. The remaining 22 features are maintained on a standby feature list. Periodically, the feature that has the lowest weight out of the 16 features currently in use in the evaluation function is replaced with the first feature from the standby 22 feature list. The replaced feature is placed at the end of the standby 22 feature list.

 2003, G.Tecuci, Learning Agents Laboratory 28 Because such a table may be very large, one may reduce it by considering only special combinations of argument values. Learning the signature table means determining the values of the function for particular combinations of the arguments. The signature table is a more general representation than a linear polynomial function. Other types of static evaluation functions The inputs are the features and the output is the value of the function. Signature table (an explicit representation of a function which gives the value of the function for each possible combination of argument values). Neural network

 2003, G.Tecuci, Learning Agents Laboratory 29 Results of Samuel’s experiments Learning based on signature tables was much more efficient than learning based on a linear polynomial function. Learning a signature table from book moves was more efficient than rote learning.

 2003, G.Tecuci, Learning Agents Laboratory 30 Recommended reading Mitchell T.M., Machine Learning, Chapter 1: Introduction, pp. 5-14, McGraw Hill, Samuel A.L., Some studies in machine learning using the game of checkers, in Readings in Machine Learning, pp The Handbook of Artificial Intelligence, vol. III, pp , pp