Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?

Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important? Play whole bunch of games, and receive reward at end (+ or -) How to determine utility of states that aren’t ending states?

The setup: Possible game states Terminal states have reward Mission: Estimate utility of all possible game states

What is a state? For chess: state is a combination of position on board and location of opponents Half of your transitions are controlled by you (your moves) Other half of your transitions are probabilistic (depend on opponent) For now, we assume all moves are probabilistic

Passive Learning Agent learns by “watching” Fixed probability of moving from one state to another

Sample Results

Technique #1: Naive Updating Also known as Least Mean Squares (LMS) approach Starting at home, obtain sequence of states to terminal state Utility of terminal state = reward loop back over all other states utility for state i = running average of all rewards seen for state i

Naive Updating Analysis Works, but converges slowly Must play lots of games Ignores that utility of a state should depend on successor

Technique #2: Adaptive Dynamic Programming Utility of a state depends entirely on the successor state If a state has one successor, utility should be the same If a state has multiple successors, utility should be expected value of successors

Finding the utilities To find all utilities, just solve equations This is just a set of linear equations, solve with Gaussian elimination “Gets you right values instantly, no convergence or iteration Completely intractable for large problems: For a real game, it means finding actual utilities of all states Assumes that you know M ij

Technique 3: Temporal Difference Learning Want utility to depend on successors, but want to solve iteratively Whenever you observe a transition from i to j:  = learning rate difference between successive states = temporal difference Converges faster than Naive updating

What if transition probabilities are unknown? Only affects technique 2, Adaptive Dynamic Programming Iteratively: Estimate transition probabilities based on what you’ve seen Solve dynamic programming problem with best estimates so far

Active Learning Probability of going from one state to another now depends on action ADP equations are now:

Exploration: where should agent go to learn utilities? Suppose you’re trying to learn optimal game playing strategies Do you follow best utility, in order to win? Do you move around at random, hoping to learn more (and losing lots in the process)? Following best utility all the time can get you stuck at an imperfect solution Following random moves can lose a lot

Where should agent go to learn utilities? f(u,n) = exploration function depends on utility of move (u), and number of times that agent has tried it (n) One possibility: instead of using utility to decide where to go, use Try a move a bunch of times, then eventually settle

Generalization in Reinforcement Learning Maintaining utilities for all seen states in a real game is intractable. Instead, treat it as a supervised learning problem Training set consists of (state, utility) pairs Learn to predict utility from state This is a regression problem, not a classification problem Can use neural network with multiple outputs

Other applications Applies to any situation where something is to learn from reinforcement Possible examples: Toy robot dogs Petz That darn paperclip “The only winning move is not to play”

Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?

Similar presentations

Presentation on theme: "Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?

Similar presentations

Presentation on theme: "Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?"— Presentation transcript:

Similar presentations

About project

Feedback