Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lyle Ungar, University of Pennsylvania Learning and Memory Reinforcement Learning.

Similar presentations


Presentation on theme: "Lyle Ungar, University of Pennsylvania Learning and Memory Reinforcement Learning."— Presentation transcript:

1 Lyle Ungar, University of Pennsylvania Learning and Memory Reinforcement Learning

2 Lyle H Ungar, University of Pennsylvania 2 Learning Levels  Darwinian Trial -> death or children  Skinnerian Reinforcement learning  Popperian Our hypotheses die in our stead  Gregorian Tools and artifacts

3 Lyle H Ungar, University of Pennsylvania 3 Machine Learning  Unsupervised Cluster similar items Association (no “right” answer)  Supervised For observations/features, teacher gives the correct “answer” E.g., Learn to recognize categories  Reinforcement Take action, observe consequence bad dog!

4 Lyle H Ungar, University of Pennsylvania 4 Pavlovian Conditioning  Pavlov Food causes salivation Sound before food -> sound causes salivation  Learn to associate sound with food

5 Lyle H Ungar, University of Pennsylvania 5 Operant Conditioning

6 Lyle H Ungar, University of Pennsylvania 6 Associative Memory  Hebbian Learning  When two connected neurons are both excited, the connection between them is strengthened Neurons that fire together, wire together

7 Lyle H Ungar, University of Pennsylvania 7 Explanations of Pavlov  S-S (stimulus-stimulus) Dogs learn to associate sound with food (and salivate based on “thinking” of food)  S-R (stimulus-response) Dogs learn to salivate based on the tone (and salivate directly without “thinking” of food)  How to test?  Do dogs think lights are food?

8 Lyle H Ungar, University of Pennsylvania 8 Conditioning in humans  Two pathways The “slow” pathway dogs use Cognitive (conscious) learning  How to test this hypothesis Learn to blink based on a stimuli associated with a puff of air.

9 Lyle H Ungar, University of Pennsylvania 9 BlockingBlocking  Tone -> Shock -> Fear  Tone -> Fear  Tone + Light -> Shock -> Fear  Light -> ?

10 Lyle H Ungar, University of Pennsylvania 10 Rescorla-Wagner Model  Hypothesis: learn from observations that are surprising V n <- V n + c (V max - V n )  V n = c (V max - V n ) V n is strength of association between US and CS c is the learning rate  Predictions contingency

11 Lyle H Ungar, University of Pennsylvania 11 Limitations of Rescorla- Wagner  Tone -> food  Light -> food  Tone + light -> ?

12 Lyle H Ungar, University of Pennsylvania 12 Reinforcement Learning  Many times one takes a long sequence of actions, and only discovers the result of these actions later (e.g. when you win or lose a game)  Q: How can one ascribe credit (or blame) to one action is a sequence of actions  A: by noting surprises

13 Lyle H Ungar, University of Pennsylvania 13 Consider a game  Estimate probability of winning  Take an action, see how the opponent (or the world) responds  Re-estimate probability of winning If it is unchanged, you learned nothing If it is higher, the initial state was better than you thought If it is lower, the state was worse than you thought

14 Lyle H Ungar, University of Pennsylvania 14 Tic-tac-toe example  Decision tree Alternate layers give possible moves for each player

15 Lyle H Ungar, University of Pennsylvania 15 Reinforcement Learning  State E.g. board position  Action E.g. move  Policy State -> Action  Reward function State -> utility  Model of the environment State, action -> state

16 Lyle H Ungar, University of Pennsylvania 16 Definitions of key terms  State What you need to know about the world to predict the effect of an action  Policy What action to take in each state  Reward function The cost or benefit of being in a state (e.g. points won or lost, happiness gained or lost)

17 Lyle H Ungar, University of Pennsylvania 17 Value Iteration  Value Function Expected value of a policy over time = sum of the expected rewards  V(s) <- V(s) + c[V(s’) - V(s)] s = state before the move s’ = state after the move “temporal difference” learning

18 Lyle H Ungar, University of Pennsylvania 18 Mouse in Maze Example policy value function

19 Lyle H Ungar, University of Pennsylvania 19 Dopamine & Reinforcement

20 Lyle H Ungar, University of Pennsylvania 20 Exploration - Exploitation  Exploration Always try a different route to work  Exploitation Always take the best route to work that you have found so far  Learning requires exploration Unless the environment is noisy

21 Lyle H Ungar, University of Pennsylvania 21 RL can be very simple  Simple learning algorithm leads to optimal policy Without predicting the effects of the agents actions Without predicting immediate payoffs Without planning Without explicit model of the world

22 Lyle H Ungar, University of Pennsylvania 22 How to play chess  Computer Evaluation function for board positions Fast search  Human (grandmaster) Memorize tens of thousands of board positions and what do to Do a much smaller search!

23 Lyle H Ungar, University of Pennsylvania 23 AI and Games  Chess Backgammon Deterministic Stochastic Position Policy evaluation + search

24 Lyle H Ungar, University of Pennsylvania 24 Scaling up value functions  For small number of states Learn the value function of each state  Not possible for Backgammon 10 20 states Learn mapping from features to value Then use reinforcement learning to get improved value estimates

25 Lyle H Ungar, University of Pennsylvania 25 Q-learningQ-learning  Instead of the Value of a state, learn the value Q(s,a) of taking an action a from a state s.  Optimal policy: take best action max a Q(s,a)  Learning rule Q(s, a) <- Q(s, a) + c[r t + max b Q(s’, b) - Q(s, a)]

26 Lyle H Ungar, University of Pennsylvania 26 Learning to Sing  Zerbra Finch hears father’s song  Memorizes it  Then practices for months to learn to reproduce it  What kind of learning is this?

27 Lyle H Ungar, University of Pennsylvania 27 Controversies?Controversies?  Is conditioning good?  How much learning do people do?  Innateness, learning, and free will


Download ppt "Lyle Ungar, University of Pennsylvania Learning and Memory Reinforcement Learning."

Similar presentations


Ads by Google