Will Britt and Bryan Silinski

Slides:



Advertisements
Similar presentations
Lirong Xia Reinforcement Learning (2) Tue, March 21, 2014.
Advertisements

Alpha-Beta Search. 2 Two-player games The object of a search is to find a path from the starting position to a goal position In a puzzle-type problem,
Markov Decision Process
Adversarial Search Reference: “Artificial Intelligence: A Modern Approach, 3 rd ed” (Russell and Norvig)
For Friday Finish chapter 5 Program 1, Milestone 1 due.
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.
Artificial Intelligence Adversarial search Fall 2008 professor: Luigi Ceccaroni.
Games: Expectimax Andrea Danyluk September 25, 2013.
Adversarial Search: Game Playing Reading: Chapter next time.
Lecture 12 Last time: CSPs, backtracking, forward checking Today: Game Playing.
MINIMAX SEARCH AND ALPHA- BETA PRUNING: PLAYER 1 VS. PLAYER 2.
Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Lecture 13 Last time: Games, minimax, alpha-beta Today: Finish off games, summary.
This time: Outline Game playing The minimax algorithm
Game Playing CSC361 AI CSC361: Game Playing.
1 search CS 331/531 Dr M M Awais A* Examples:. 2 search CS 331/531 Dr M M Awais 8-Puzzle f(N) = g(N) + h(N)
1 DCP 1172 Introduction to Artificial Intelligence Lecture notes for Chap. 6 [AIMA] Chang-Sheng Chen.
Alpha-Beta Search. 2 Two-player games The object of a search is to find a path from the starting position to a goal position In a puzzle-type problem,
Game Playing State-of-the-Art  Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in Used an endgame database defining.
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
Lecture 5 Note: Some slides and/or pictures are adapted from Lecture slides / Books of Dr Zafar Alvi. Text Book - Aritificial Intelligence Illuminated.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Lecture 6: Game Playing Heshaam Faili University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Chapter 12 Adversarial Search. (c) 2000, 2001 SNU CSE Biointelligence Lab2 Two-Agent Games (1) Idealized Setting  The actions of the agents are interleaved.
Mark Dunlop, Computer and Information Sciences, Strathclyde University 1 Algorithms & Complexity 5 Games Mark D Dunlop.
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
1 Adversarial Search CS 171/271 (Chapter 6) Some text and images in these slides were drawn from Russel & Norvig’s published material.
Quiz 4 : Minimax Minimax is a paranoid algorithm. True
CSCI 4310 Lecture 6: Adversarial Tree Search. Book Winston Chapter 6.
GAME PLAYING 1. There were two reasons that games appeared to be a good domain in which to explore machine intelligence: 1.They provide a structured task.
Adversarial Search Chapter Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent reply Time limits.
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Today’s Topics Playing Deterministic (no Dice, etc) Games –Mini-max –  -  pruning –ML and games? 1997: Computer Chess Player (IBM’s Deep Blue) Beat Human.
Game tree search Chapter 6 (6.1 to 6.3 and 6.6) cover games. 6.6 covers state of the art game players in particular. 6.5 covers games that involve uncertainty.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Graph Search II GAM 376 Robin Burke. Outline Homework #3 Graph search review DFS, BFS A* search Iterative beam search IA* search Search in turn-based.
Adversarial Search 2 (Game Playing)
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module 5 Adversarial Search (Thanks Meinolf Sellman!)
Artificial Intelligence in Game Design Board Games and the MinMax Algorithm.
Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.
Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.
Adversarial Search Chapter Two-Agent Games (1) Idealized Setting – The actions of the agents are interleaved. Example – Grid-Space World – Two.
Announcements Homework 1 Full assignment posted..
By: Casey Savage, Hayley Stueber, and James Olson
Last time: search strategies
CS Fall 2016 (Shavlik©), Lecture 11, Week 6
Reinforcement Learning
Expectimax Lirong Xia. Expectimax Lirong Xia Project 2 MAX player: Pacman Question 1-3: Multiple MIN players: ghosts Extend classical minimax search.
Announcements Homework 3 due today (grace period through Friday)
Alpha-Beta Search.
Kevin Mason Michael Suggs
Alpha-Beta Search.
Alpha-Beta Search.
CS 188: Artificial Intelligence Fall 2008
Expectimax Lirong Xia.
Minimax strategies, alpha beta pruning
Alpha-Beta Search.
Adversarial Search CS 171/271 (Chapter 6)
Alpha-Beta Search.
Minimax strategies, alpha beta pruning
Reinforcement Learning (2)
Unit II Game Playing.
Reinforcement Learning (2)
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning
Presentation transcript:

Will Britt and Bryan Silinski Pac-Man Will Britt and Bryan Silinski

Pac-Man Background Information In Pac-Man, the agent has to decide between making 5 moves at maximum (North, South, East, West, and Stop) Ghosts move randomly around the stage. Goal is to eat all the dots while avoiding the ghosts. Score Manipulators: Eat Dot +10 Win +500 Eat Ghost +200 Eaten by Ghost -500 Move(time) -1

Formal Statement Given a set of N moves, our agent should choose a move which will best maximize utility. Utility will be determined by a performance evaluation function, objective criterion for success of an agent’s behavior. For set N moves:   max(U(Ni)) where Ni is a move from the set, and U() is the utility evaluation function.

Utility Utility represents the motivation of an agent. In our game, the motivations are things such as: eating dots, eating power pellets, avoiding ghosts, etc. A utility function assigns a score for every possible outcome, a higher score represents a higher preference for that particular outcome. Our utility function is ordinal, which means that the decisions will be based on the relative orderings of possible outcomes and the degree of difference does not matter.

Informal Statement We aim to navigate the Pac-Man agent to best avoid ghosts and eat the pellets. Given context from the environment(proximity of ghosts, dots, etc.), we want Pac-Man to make the most rational choice for movement in hopes that this will lead to the agent performing best at the game. A rational agent is one that maximizes utility based on current knowledge.

Algorithms

Algorithms Chosen Reflex Agent Minimax Expectimax qLearning Depth 2 50 Training Episodes 100 Training Episodes 500 Training Episodes 1000 Training Episodes

Reflex Agent A reflex agent only looks at the current state and a potential move on the game board in order to choose its next move. Does not consider the consequences of the chosen move in terms of what happens afterward. “I am here, which move appears to have the best utility”

Reflex Agent (continued) In Pac-Man, the agent has to decide between making 5 moves at maximum (North, South, East, West, and Stop) In order for a reflex agent to be used, we needed to implement a function to evaluate how “good” each move was (calculate the utility). This performance evaluation utility function looked at things such as: will the next move bring the agent closer to food? closer to a ghost? obtain a power pellet. Each possible move is ran through the evaluation function and the move with the best score is chosen (ordinal utility function).

Reflex Agent(continued) Potential move scores are calculated very fast. O(n) where n represents the number of possible moves evaluated by the utility function. For our example, Pac-Man can have 3- 5 possible moves at any given state which are run through the utility function in order to score each one. One disadvantage is that the agent does not look far in advance enough to consider the consequences of the actions.

Minimax Often implemented in two-player “full information” games. Full information games are games in which players knows all possible moves of the adversary. Ex: (Chess, Tic-Tac Toe, etc.) One player tries to maximize their scores(i.e. Pacman) while the adversary tries to minimize the opponents score(i.e. Pacman) Minimax takes into account future moves by both the player and the opponent in order to best choose a decision. Minimax also operates under the assumption that the adversary will make the optimal choice.

Minimax Implementation If the game over state is reached, return the score from the player’s point of view. Else, get game states for every possible move for whichever player’s turn it is. create a list of scores from those states using some sort of performance evaluation function (utility function). If the turn is the opponent’s then return the minimum score from the score list. If the turn is the player’s then return the maximum score from the score list.

Minimax The time complexity for the minimax algorithm is O(b^n) where b^n represents the amount of game states sent to the utility function. - b represents the amount of game states per depth, in Pac-Man this would be 3-5(Pac-man successor states) multiplied by 4-16( ghost successor game states). - n represents depth

Expectimax Expectimax is similar to minimax but does not assume an optimal adversary. Takes into account the probabilities of outcomes. Very similar to the minimax algorithm, but adds in chance nodes. Expectimax makes decisions based on expected utilities.

Expectimax The time complexity of O(b^n) is the same as minimax, where b^n represents the amount of game states evaluated by the utility function. Once again, b represents the amount of game states per depth ( in Pac-Man this would be 3-5(Pac-man successor states) multiplied by 4-16( ghost successor game states). n represents depth

Minimax vs. Expectimax

Q Learning State, Action based machine learning algorithm Good for room traversal or mapping Not equipped for larger problems such as moving ghosts We implemented an approximate qLearning algorithm in such it attempts to find similarities while training. Uses “features” to determine important or not important information about the game board by updating the weight of each feature in order to converge upon the best weight. Given the update works it should not matter if you duplicate features because the training will alter the weight to adjust for error O(1) due to making decisions based on a lookup table

qLearning Update

qLearning Features Bias: Way to minimize error in machine learning algorithms (State, Action): Navigate the map more efficiently Ghosts one step away: Avoid the ghosts Eats Food: Eating food is crucial to win the game

Algorithm Results

Algorithm Avg Move Time Avg # Moves AVG Score Win % Move STD Reflex 0.001919824 68.144 -445.024 52.22509669 Minimax 2 0.004229125 189.043 975.877 0.672 81.1781562 Minimax 3 0.012895651 322.443 801.657 0.648 151.2438741 Minimax 4 0.061221232 411.292 818.098 0.682 193.9309773 ExpectiMax 2 0.016378297 168.175 1223.215 0.836 60.59668583 ExpectiMax 3 0.048551765 192.308 1283.672 0.915 54.26267201 ExpectiMax 4 0.234599232 200.735 1305.865 0.927 52.37050166 Qlearn 50 0.000527238 132.411 1191.929 0.9 29.7713354 Qlearn 100 0.000494471 129.223 1214.287 0.91 25.68413802 Qlearn 500 0.000504648 130.534 1230.316 0.921 24.23758041 Qlearn 1000 0.000488254 130.6677632 1205.631579 0.90296053 25.28541812 Qlearn 50 W/ Train* 0.001037794 Qlearn 100 w/Train* 0.000993991 Qlearn 500 w/Train* 0.000958897 Qlearn 1000 w/Train* 0.000942771

Issues and Future Consideration The utility function is subjective. we assigned weights to what we thought was important (avoiding ghosts, eating dots) these weights may not have been the best choices possible might have been useful to develop an algorithm to come up with weights Alpha-beta pruning Reflex Agent issues

Questions What is utility? What is a rational agent? Why does a reflex agent have a time complexity of O(n)? When would it be more beneficial to use expectimax instead of minimax?

Questions What is utility? - Utility represents the motivation of an agent or the usefulness of the consequences of a particular action. What is a rational agent? - A rational agent is one that maximizes utility based on current knowledge. Why does a reflex agent have a time complexity of O(n)? - A reflex agent runs as many times as possible choices available for making a decision. n represents these choices. When would it be more beneficial to use expectimax instead of minimax? - When there are probabilities involved and it would be more favorable to calculate expected utilities.

Questions?