Will Britt and Bryan Silinski

Name: Will Britt and Bryan Silinski
Uploaded: 2017-08-19T17:55:23+00:00
Duration: PTM13S55
Channel: Melvin Walters
Description: Will Britt and Bryan Silinski

Will Britt and Bryan Silinski
Pac-Man Will Britt and Bryan Silinski

Pac-Man Background Information
In Pac-Man, the agent has to decide between making 5 moves at maximum (North, South, East, West, and Stop) Ghosts move randomly around the stage. Goal is to eat all the dots while avoiding the ghosts. Score Manipulators: Eat Dot +10 Win +500 Eat Ghost +200 Eaten by Ghost -500 Move(time) -1

Formal Statement Given a set of N moves, our agent should choose a move which will best maximize utility. Utility will be determined by a performance evaluation function, objective criterion for success of an agent’s behavior. For set N moves: max(U(Ni)) where Ni is a move from the set, and U() is the utility evaluation function.

Utility Utility represents the motivation of an agent. In our game, the motivations are things such as: eating dots, eating power pellets, avoiding ghosts, etc. A utility function assigns a score for every possible outcome, a higher score represents a higher preference for that particular outcome. Our utility function is ordinal, which means that the decisions will be based on the relative orderings of possible outcomes and the degree of difference does not matter.

Informal Statement We aim to navigate the Pac-Man agent to best avoid ghosts and eat the pellets. Given context from the environment(proximity of ghosts, dots, etc.), we want Pac-Man to make the most rational choice for movement in hopes that this will lead to the agent performing best at the game. A rational agent is one that maximizes utility based on current knowledge.

Algorithms

Algorithms Chosen Reflex Agent Minimax Expectimax qLearning Depth 2
50 Training Episodes 100 Training Episodes 500 Training Episodes 1000 Training Episodes

Reflex Agent A reflex agent only looks at the current state and a potential move on the game board in order to choose its next move. Does not consider the consequences of the chosen move in terms of what happens afterward. “I am here, which move appears to have the best utility”

Reflex Agent (continued)
In Pac-Man, the agent has to decide between making 5 moves at maximum (North, South, East, West, and Stop) In order for a reflex agent to be used, we needed to implement a function to evaluate how “good” each move was (calculate the utility). This performance evaluation utility function looked at things such as: will the next move bring the agent closer to food? closer to a ghost? obtain a power pellet. Each possible move is ran through the evaluation function and the move with the best score is chosen (ordinal utility function).

Reflex Agent(continued)
Potential move scores are calculated very fast. O(n) where n represents the number of possible moves evaluated by the utility function. For our example, Pac-Man can have 3- 5 possible moves at any given state which are run through the utility function in order to score each one. One disadvantage is that the agent does not look far in advance enough to consider the consequences of the actions.

Minimax Often implemented in two-player “full information” games.
Full information games are games in which players knows all possible moves of the adversary. Ex: (Chess, Tic-Tac Toe, etc.) One player tries to maximize their scores(i.e. Pacman) while the adversary tries to minimize the opponents score(i.e. Pacman) Minimax takes into account future moves by both the player and the opponent in order to best choose a decision. Minimax also operates under the assumption that the adversary will make the optimal choice.

Minimax Implementation
If the game over state is reached, return the score from the player’s point of view. Else, get game states for every possible move for whichever player’s turn it is. create a list of scores from those states using some sort of performance evaluation function (utility function). If the turn is the opponent’s then return the minimum score from the score list. If the turn is the player’s then return the maximum score from the score list.

Minimax The time complexity for the minimax algorithm is O(b^n) where b^n represents the amount of game states sent to the utility function. - b represents the amount of game states per depth, in Pac-Man this would be 3-5(Pac-man successor states) multiplied by 4-16( ghost successor game states). - n represents depth

Expectimax Expectimax is similar to minimax but does not assume an optimal adversary. Takes into account the probabilities of outcomes. Very similar to the minimax algorithm, but adds in chance nodes. Expectimax makes decisions based on expected utilities.

Expectimax The time complexity of O(b^n) is the same as minimax, where b^n represents the amount of game states evaluated by the utility function. Once again, b represents the amount of game states per depth ( in Pac-Man this would be 3-5(Pac-man successor states) multiplied by 4-16( ghost successor game states). n represents depth

Minimax vs. Expectimax

Q Learning State, Action based machine learning algorithm
Good for room traversal or mapping Not equipped for larger problems such as moving ghosts We implemented an approximate qLearning algorithm in such it attempts to find similarities while training. Uses “features” to determine important or not important information about the game board by updating the weight of each feature in order to converge upon the best weight. Given the update works it should not matter if you duplicate features because the training will alter the weight to adjust for error O(1) due to making decisions based on a lookup table

qLearning Update

qLearning Features Bias: Way to minimize error in machine learning algorithms (State, Action): Navigate the map more efficiently Ghosts one step away: Avoid the ghosts Eats Food: Eating food is crucial to win the game

Algorithm Results

Algorithm Avg Move Time Avg # Moves AVG Score Win % Move STD Reflex 68.144 Minimax 2 0.672 Minimax 3 0.648 Minimax 4 0.682 ExpectiMax 2 0.836 ExpectiMax 3 0.915 ExpectiMax 4 0.927 Qlearn 50 0.9 Qlearn 100 0.91 Qlearn 500 0.921 Qlearn 1000 Qlearn 50 W/ Train* Qlearn 100 w/Train* Qlearn 500 w/Train* Qlearn 1000 w/Train*

Issues and Future Consideration
The utility function is subjective. we assigned weights to what we thought was important (avoiding ghosts, eating dots) these weights may not have been the best choices possible might have been useful to develop an algorithm to come up with weights Alpha-beta pruning Reflex Agent issues

Questions What is utility? What is a rational agent?
Why does a reflex agent have a time complexity of O(n)? When would it be more beneficial to use expectimax instead of minimax?

Questions What is utility? - Utility represents the motivation of an agent or the usefulness of the consequences of a particular action. What is a rational agent? - A rational agent is one that maximizes utility based on current knowledge. Why does a reflex agent have a time complexity of O(n)? - A reflex agent runs as many times as possible choices available for making a decision. n represents these choices. When would it be more beneficial to use expectimax instead of minimax? - When there are probabilities involved and it would be more favorable to calculate expected utilities.

Questions?

Will Britt and Bryan Silinski

Similar presentations

Presentation on theme: "Will Britt and Bryan Silinski"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Will Britt and Bryan Silinski

Similar presentations

Presentation on theme: "Will Britt and Bryan Silinski"— Presentation transcript:

Similar presentations

About project

Feedback