Lyle Ungar, University of Pennsylvania Learning and Memory Reinforcement Learning.

Slides:



Advertisements
Similar presentations
Reinforcement Learning
Advertisements

Reinforcement learning
Markov Decision Process
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
10/29/01Reinforcement Learning in Games 1 Colin Cherry Oct 29/01.
Adversarial Search Chapter 5.
Adversarial Search: Game Playing Reading: Chapter next time.
Artificial Intelligence in Game Design Heuristics and Other Ideas in Board Games.
10/19/2004TCSS435A Isabelle Bichindaritz1 Game and Tree Searching.
Reinforcement learning (Chapter 21)
Pondering Probabilistic Play Policies for Pig Todd W. Neller Gettysburg College.
Classical Conditioning Pavlov’s experiment - psychic secretions. Pavlov was a Russian physiologists who studied digestion. He won the Nobel prize in physiology.
Markov Decision Processes
Reinforcement learning
Reinforcement Learning
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Reinforcement Learning Introduction Presented by Alp Sardağ.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
Learning What is Learning? –Relatively permanent change in behavior that results from experience (behaviorist tradition) –Can there be learning that does.
1 Machine Learning: Symbol-based 9d 9.0Introduction 9.1A Framework for Symbol-based Learning 9.2Version Space Search 9.3The ID3 Decision Tree Induction.
Games & Adversarial Search Chapter 6 Section 1 – 4.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Learning: Reinforcement Learning Russell and Norvig: ch 21 CMSC421 – Fall 2005.
Reinforcement Learning (1)
Reinforcement Learning Russell and Norvig: Chapter 21 CMSC 421 – Fall 2006.
Innate Knowledge (what an organism is born with) Experience leads to changes in knowledge and behavior Learning refers to the process of adaptation Of.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
Game Playing. Introduction Why is game playing so interesting from an AI point of view? –Game Playing is harder then common searching The search space.
Game Playing.
Reinforcement Learning
Artificial Intelligence in Game Design Lecture 22: Heuristics and Other Ideas in Board Games.
Reinforcement Learning
Innate Behavior Patterns Reflex Tropism –kinesis (undirected) –taxis (directed) Fixed Action Pattern –species-specific; unlearned; goes to completion Reaction.
Behaviourism Johnny D is very creative he loves playing the guitar, the drums and singing. In fact he loves anything involved in producing music. Why is.
CPS 270: Artificial Intelligence Machine learning Instructor: Vincent Conitzer.
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
Innate Knowledge (what an organism is born with) Experience leads to changes in knowledge and behavior Learning refers to the process of adaptation Of.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Adversarial Search Chapter Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent reply Time limits.
Low level learning What is low-level learning? Habituation Classical conditioning Operant conditioning Who cares?
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Reinforcement learning (Chapter 21)
Reinforcement Learning
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Rat Maze Activity complete the maze starting at the ear and ending at the tail you must work on your own to complete it receive a small piece of candy.
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.
Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
Reinforcement Learning
Done Done Course Overview What is AI? What are the Major Challenges?
Reinforcement learning (Chapter 21)
Reinforcement Learning (1)
Reinforcement learning (Chapter 21)
Markov Decision Processes
Reinforcement Learning
CS 188: Artificial Intelligence
Announcements Homework 3 due today (grace period through Friday)
Dr. Unnikrishnan P.C. Professor, EEE
Reinforcement Learning (2)
Instructor: Vincent Conitzer
Reinforcement Learning (2)
Presentation transcript:

Lyle Ungar, University of Pennsylvania Learning and Memory Reinforcement Learning

Lyle H Ungar, University of Pennsylvania 2 Learning Levels  Darwinian Trial -> death or children  Skinnerian Reinforcement learning  Popperian Our hypotheses die in our stead  Gregorian Tools and artifacts

Lyle H Ungar, University of Pennsylvania 3 Machine Learning  Unsupervised Cluster similar items Association (no “right” answer)  Supervised For observations/features, teacher gives the correct “answer” E.g., Learn to recognize categories  Reinforcement Take action, observe consequence bad dog!

Lyle H Ungar, University of Pennsylvania 4 Pavlovian Conditioning  Pavlov Food causes salivation Sound before food -> sound causes salivation  Learn to associate sound with food

Lyle H Ungar, University of Pennsylvania 5 Operant Conditioning

Lyle H Ungar, University of Pennsylvania 6 Associative Memory  Hebbian Learning  When two connected neurons are both excited, the connection between them is strengthened Neurons that fire together, wire together

Lyle H Ungar, University of Pennsylvania 7 Explanations of Pavlov  S-S (stimulus-stimulus) Dogs learn to associate sound with food (and salivate based on “thinking” of food)  S-R (stimulus-response) Dogs learn to salivate based on the tone (and salivate directly without “thinking” of food)  How to test?  Do dogs think lights are food?

Lyle H Ungar, University of Pennsylvania 8 Conditioning in humans  Two pathways The “slow” pathway dogs use Cognitive (conscious) learning  How to test this hypothesis Learn to blink based on a stimuli associated with a puff of air.

Lyle H Ungar, University of Pennsylvania 9 BlockingBlocking  Tone -> Shock -> Fear  Tone -> Fear  Tone + Light -> Shock -> Fear  Light -> ?

Lyle H Ungar, University of Pennsylvania 10 Rescorla-Wagner Model  Hypothesis: learn from observations that are surprising V n <- V n + c (V max - V n )  V n = c (V max - V n ) V n is strength of association between US and CS c is the learning rate  Predictions contingency

Lyle H Ungar, University of Pennsylvania 11 Limitations of Rescorla- Wagner  Tone -> food  Light -> food  Tone + light -> ?

Lyle H Ungar, University of Pennsylvania 12 Reinforcement Learning  Many times one takes a long sequence of actions, and only discovers the result of these actions later (e.g. when you win or lose a game)  Q: How can one ascribe credit (or blame) to one action is a sequence of actions  A: by noting surprises

Lyle H Ungar, University of Pennsylvania 13 Consider a game  Estimate probability of winning  Take an action, see how the opponent (or the world) responds  Re-estimate probability of winning If it is unchanged, you learned nothing If it is higher, the initial state was better than you thought If it is lower, the state was worse than you thought

Lyle H Ungar, University of Pennsylvania 14 Tic-tac-toe example  Decision tree Alternate layers give possible moves for each player

Lyle H Ungar, University of Pennsylvania 15 Reinforcement Learning  State E.g. board position  Action E.g. move  Policy State -> Action  Reward function State -> utility  Model of the environment State, action -> state

Lyle H Ungar, University of Pennsylvania 16 Definitions of key terms  State What you need to know about the world to predict the effect of an action  Policy What action to take in each state  Reward function The cost or benefit of being in a state (e.g. points won or lost, happiness gained or lost)

Lyle H Ungar, University of Pennsylvania 17 Value Iteration  Value Function Expected value of a policy over time = sum of the expected rewards  V(s) <- V(s) + c[V(s’) - V(s)] s = state before the move s’ = state after the move “temporal difference” learning

Lyle H Ungar, University of Pennsylvania 18 Mouse in Maze Example policy value function

Lyle H Ungar, University of Pennsylvania 19 Dopamine & Reinforcement

Lyle H Ungar, University of Pennsylvania 20 Exploration - Exploitation  Exploration Always try a different route to work  Exploitation Always take the best route to work that you have found so far  Learning requires exploration Unless the environment is noisy

Lyle H Ungar, University of Pennsylvania 21 RL can be very simple  Simple learning algorithm leads to optimal policy Without predicting the effects of the agents actions Without predicting immediate payoffs Without planning Without explicit model of the world

Lyle H Ungar, University of Pennsylvania 22 How to play chess  Computer Evaluation function for board positions Fast search  Human (grandmaster) Memorize tens of thousands of board positions and what do to Do a much smaller search!

Lyle H Ungar, University of Pennsylvania 23 AI and Games  Chess Backgammon Deterministic Stochastic Position Policy evaluation + search

Lyle H Ungar, University of Pennsylvania 24 Scaling up value functions  For small number of states Learn the value function of each state  Not possible for Backgammon states Learn mapping from features to value Then use reinforcement learning to get improved value estimates

Lyle H Ungar, University of Pennsylvania 25 Q-learningQ-learning  Instead of the Value of a state, learn the value Q(s,a) of taking an action a from a state s.  Optimal policy: take best action max a Q(s,a)  Learning rule Q(s, a) <- Q(s, a) + c[r t + max b Q(s’, b) - Q(s, a)]

Lyle H Ungar, University of Pennsylvania 26 Learning to Sing  Zerbra Finch hears father’s song  Memorizes it  Then practices for months to learn to reproduce it  What kind of learning is this?

Lyle H Ungar, University of Pennsylvania 27 Controversies?Controversies?  Is conditioning good?  How much learning do people do?  Innateness, learning, and free will