University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning.

University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans Kleine Büning 2 University Paderborn Outline Motivation Applications Markov Decision Processes Q-learning Examples

Reinforcement Learning Prof. Dr. Hans Kleine Büning 3 University Paderborn

Reinforcement Learning Prof. Dr. Hans Kleine Büning 4 University Paderborn Reinforcement Learning: The Idea A way of programming agents by reward and punishment without specifying how the task is to be achieved

Reinforcement Learning Prof. Dr. Hans Kleine Büning 5 University Paderborn Learning to Ride a Bicycle Environment stat e action

Reinforcement Learning Prof. Dr. Hans Kleine Büning 6 University Paderborn Learning to Ride a Bicycle States: –Angle of handle bars –Angular velocity of handle bars –Angle of bicycle to vertical –Angular velocity of bicycle to vertical –Acceleration of angle of bicycle to vertical

Reinforcement Learning Prof. Dr. Hans Kleine Büning 8 University Paderborn Learning to Ride a Bicycle Actions: –Torque to be applied to the handle bars –Displacement of the center of mass from the bicycles plan (in cm)

Reinforcement Learning Prof. Dr. Hans Kleine Büning 10 University Paderborn Angle of bicycle to vertical is greater than 12° Reward = 0 Reward = -1 no yes

Reinforcement Learning Prof. Dr. Hans Kleine Büning 11 University Paderborn Learning To Ride a Bicycle Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans Kleine Büning 12 University Paderborn Reinforcement Learning: Applications Board Games –TD-Gammon program, based on reinforcement learning, has become a world-class backgammon player Mobile Robot Controlling –Learning to ride a Bicycle –Navigation –Pole-balancing –Acrobot Sequential Process Controlling –Elevator dispatching

Reinforcement Learning Prof. Dr. Hans Kleine Büning 13 University Paderborn History of Reinforcement Learning Trial and error learning in psychology of animal learning Optimal control and dynamic programming Temporal-difference methods

Reinforcement Learning Prof. Dr. Hans Kleine Büning 14 University Paderborn Key Features of Reinforcement Learning Learner is not told which actions to take Trial and error search Possibility of delayed reward: –Sacrifice of short-term gains for greater long-term gains Explore/Exploit trade-off Considers the whole problem of a goal-directed agent interaction with an uncertain environment

Reinforcement Learning Prof. Dr. Hans Kleine Büning 15 University Paderborn The Agent-Environment Interaction Agent and environment interact at discrete time steps: t = 0,1, 2, … –Agent observes state at step t : s t 2 S –produces action at step t: a t 2 A –gets resulting reward : r t +1 2 –and resulting next state: s t +1 2 S

Reinforcement Learning Prof. Dr. Hans Kleine Büning 16 University Paderborn The Agents Goal: Coarsely, the agents goal is to get as much reward as it can over the long run Policy is a mapping from states to action s) = a Reinforcement learning methods specify how the agent changes its policy as a result of experience

Reinforcement Learning Prof. Dr. Hans Kleine Büning 17 University Paderborn Deterministic Markov Decision Process

Reinforcement Learning Prof. Dr. Hans Kleine Büning 18 University Paderborn Example

Reinforcement Learning Prof. Dr. Hans Kleine Büning 19 University Paderborn Example: Corresponding MDP

Reinforcement Learning Prof. Dr. Hans Kleine Büning 22 University Paderborn Example: Policy

Reinforcement Learning Prof. Dr. Hans Kleine Büning 23 University Paderborn Value of Policy and Rewards

Reinforcement Learning Prof. Dr. Hans Kleine Büning 24 University Paderborn Value of Policy and Agents Task

Reinforcement Learning Prof. Dr. Hans Kleine Büning 25 University Paderborn Nondeterministic Markov Decision Process P = 0.8 P = 0.1

Reinforcement Learning Prof. Dr. Hans Kleine Büning 26 University Paderborn Nondeterministic Markov Decision Process

Reinforcement Learning Prof. Dr. Hans Kleine Büning 27 University Paderborn Nondeterministic Markov Decision Process

Reinforcement Learning Prof. Dr. Hans Kleine Büning 28 University Paderborn Example with South-Easten Wind

Reinforcement Learning Prof. Dr. Hans Kleine Büning 29 University Paderborn Example with South-Easten Wind

Reinforcement Learning Prof. Dr. Hans Kleine Büning 30 University Paderborn Methods Dynamic Programming Value Function Approximation + Dynamic Programming Reinforcement Learning (Q-learning, Monte Carlo Methods) Value Function Approximation + Reinforcement Learning continuous states discrete states continuous states Model (reward function and transition probabilities) is known Model (reward function or transition probabilities) is unknown

Reinforcement Learning Prof. Dr. Hans Kleine Büning 31 University Paderborn Q-learning Algorithm

Reinforcement Learning Prof. Dr. Hans Kleine Büning 32 University Paderborn Q-learning Algorithm

Reinforcement Learning Prof. Dr. Hans Kleine Büning 33 University Paderborn Example

Reinforcement Learning Prof. Dr. Hans Kleine Büning 34 University Paderborn Example: Q-table Initialization

Reinforcement Learning Prof. Dr. Hans Kleine Büning 35 University Paderborn Example: Episode 1

Reinforcement Learning Prof. Dr. Hans Kleine Büning 40 University Paderborn Example: Q-table

Reinforcement Learning Prof. Dr. Hans Kleine Büning 42 University Paderborn Episode 1

Reinforcement Learning Prof. Dr. Hans Kleine Büning 43 University Paderborn Example: Q-table

Reinforcement Learning Prof. Dr. Hans Kleine Büning 47 University Paderborn Example: Q-table after Convergence

Reinforcement Learning Prof. Dr. Hans Kleine Büning 48 University Paderborn Example: Value Function after Convergence

Reinforcement Learning Prof. Dr. Hans Kleine Büning 49 University Paderborn Example: Optimal Policy

Reinforcement Learning Prof. Dr. Hans Kleine Büning 50 University Paderborn Example: Optimal Policy

Reinforcement Learning Prof. Dr. Hans Kleine Büning 51 University Paderborn Q-learning

Reinforcement Learning Prof. Dr. Hans Kleine Büning 52 University Paderborn Convergence of Q-learning

Reinforcement Learning Prof. Dr. Hans Kleine Büning 53 University Paderborn Blackjack Standard rules of blackjack hold State space: –element[0] - current value of player's hand (4-21) –element[1] - value of dealer's face-up card (2-11) –element[2] - player does not have usable ace (0/1) Starting states: –player has any 2 cards (uniformly distributed), dealer has any 1 card (uniformly distributed) Actions: –HIT –STICK Rewards: –1 for a loss –0 for a draw –1 for a win

Reinforcement Learning Prof. Dr. Hans Kleine Büning 54 University Paderborn Blackjack: Optimal Policy

Reinforcement Learning Prof. Dr. Hans Kleine Büning 55 University Paderborn Reinforcement Learning: Example States –Grids Actions –Left –Up –Right –Down Rewards –Bonus 20 –Food 1 –Predator -10 –Empty grid -0.1 Transition probabilities –0.80 – agent goes where he intends to go –0.20 – to any other adjacent grid or remains where it was (in case he is on the board of the grid world he goes to the other side)

Reinforcement Learning Prof. Dr. Hans Kleine Büning 56 University Paderborn Reinforcement Learning: Example

University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning.

Similar presentations

Presentation on theme: "University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning.

Similar presentations

Presentation on theme: "University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning."— Presentation transcript:

Similar presentations

About project

Feedback