Download presentation

Presentation is loading. Please wait.

Published byMary Stephens Modified over 3 years ago

1
University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

2
Reinforcement Learning Prof. Dr. Hans Kleine Büning 2 University Paderborn Outline Motivation Applications Markov Decision Processes Q-learning Examples

3
Reinforcement Learning Prof. Dr. Hans Kleine Büning 3 University Paderborn

4
Reinforcement Learning Prof. Dr. Hans Kleine Büning 4 University Paderborn Reinforcement Learning: The Idea A way of programming agents by reward and punishment without specifying how the task is to be achieved

5
Reinforcement Learning Prof. Dr. Hans Kleine Büning 5 University Paderborn Learning to Ride a Bicycle Environment stat e action

6
Reinforcement Learning Prof. Dr. Hans Kleine Büning 6 University Paderborn Learning to Ride a Bicycle States: –Angle of handle bars –Angular velocity of handle bars –Angle of bicycle to vertical –Angular velocity of bicycle to vertical –Acceleration of angle of bicycle to vertical

7
Reinforcement Learning Prof. Dr. Hans Kleine Büning 7 University Paderborn Learning to Ride a Bicycle Environment stat e action

8
Reinforcement Learning Prof. Dr. Hans Kleine Büning 8 University Paderborn Learning to Ride a Bicycle Actions: –Torque to be applied to the handle bars –Displacement of the center of mass from the bicycles plan (in cm)

9
Reinforcement Learning Prof. Dr. Hans Kleine Büning 9 University Paderborn Learning to Ride a Bicycle Environment stat e action

10
Reinforcement Learning Prof. Dr. Hans Kleine Büning 10 University Paderborn Angle of bicycle to vertical is greater than 12° Reward = 0 Reward = -1 no yes

11
Reinforcement Learning Prof. Dr. Hans Kleine Büning 11 University Paderborn Learning To Ride a Bicycle Reinforcement Learning

12
Reinforcement Learning Prof. Dr. Hans Kleine Büning 12 University Paderborn Reinforcement Learning: Applications Board Games –TD-Gammon program, based on reinforcement learning, has become a world-class backgammon player Mobile Robot Controlling –Learning to Drive a Bicycle –Navigation –Pole-balancing –Acrobot Sequential Process Controlling –Elevator Dispatching

13
Reinforcement Learning Prof. Dr. Hans Kleine Büning 13 University Paderborn Key Features of Reinforcement Learning Learner is not told which actions to take Trial and error search Possibility of delayed reward: –Sacrifice of short-term gains for greater long-term gains Explore/Exploit trade-off Considers the whole problem of a goal-directed agent interacting with an uncertain environment

14
Reinforcement Learning Prof. Dr. Hans Kleine Büning 14 University Paderborn The Agent-Environment Interaction Agent and environment interact at discrete time steps: t = 0,1, 2, … –Agent observes state at step t : s t 2 S –produces action at step t: a t 2 A –gets resulting reward : r t +1 2 –and resulting next state: s t +1 2 S

15
Reinforcement Learning Prof. Dr. Hans Kleine Büning 15 University Paderborn The Agents Goal: Coarsely, the agents goal is to get as much reward as it can over the long run Policy is a mapping from states to action s) = a Reinforcement learning methods specify how the agent changes its policy as a result of experience experience

16
Reinforcement Learning Prof. Dr. Hans Kleine Büning 16 University Paderborn Deterministic Markov Decision Process

17
Reinforcement Learning Prof. Dr. Hans Kleine Büning 17 University Paderborn Example

18
Reinforcement Learning Prof. Dr. Hans Kleine Büning 18 University Paderborn Example: Corresponding MDP

19
Reinforcement Learning Prof. Dr. Hans Kleine Büning 19 University Paderborn Example: Corresponding MDP

20
Reinforcement Learning Prof. Dr. Hans Kleine Büning 20 University Paderborn Example: Corresponding MDP

21
Reinforcement Learning Prof. Dr. Hans Kleine Büning 21 University Paderborn Example: Policy

22
Reinforcement Learning Prof. Dr. Hans Kleine Büning 22 University Paderborn Value of Policy and Rewards

23
Reinforcement Learning Prof. Dr. Hans Kleine Büning 23 University Paderborn Value of Policy and Agents Task

24
Reinforcement Learning Prof. Dr. Hans Kleine Büning 24 University Paderborn Nondeterministic Markov Decision Process P = 0.8 P = 0.1

25
Reinforcement Learning Prof. Dr. Hans Kleine Büning 25 University Paderborn Nondeterministic Markov Decision Process

26
Reinforcement Learning Prof. Dr. Hans Kleine Büning 26 University Paderborn Nondeterministic Markov Decision Process

27
Reinforcement Learning Prof. Dr. Hans Kleine Büning 27 University Paderborn Example with South-Easten Wind

28
Reinforcement Learning Prof. Dr. Hans Kleine Büning 28 University Paderborn Example with South-Easten Wind

29
Reinforcement Learning Prof. Dr. Hans Kleine Büning 29 University Paderborn Methods Dynamic Programming Value Function Approximation + Dynamic Programming Reinforcement Learning, Monte Carlo Methods Valuation Function Approximation + Reinforcement Learning continuous states discrete states continuous states Model (reward function and transition probabilities) is known Model (reward function or transition probabilities) is unknown

30
Reinforcement Learning Prof. Dr. Hans Kleine Büning 30 University Paderborn Q-learning Algorithm

31
Reinforcement Learning Prof. Dr. Hans Kleine Büning 31 University Paderborn Q-learning Algorithm

32
Reinforcement Learning Prof. Dr. Hans Kleine Büning 32 University Paderborn Example

33
Reinforcement Learning Prof. Dr. Hans Kleine Büning 33 University Paderborn Example: Q-table Initialization

34
Reinforcement Learning Prof. Dr. Hans Kleine Büning 34 University Paderborn Example: Episode 1

35
Reinforcement Learning Prof. Dr. Hans Kleine Büning 35 University Paderborn Example: Episode 1

36
Reinforcement Learning Prof. Dr. Hans Kleine Büning 36 University Paderborn Example: Episode 1

37
Reinforcement Learning Prof. Dr. Hans Kleine Büning 37 University Paderborn Example: Episode 1

38
Reinforcement Learning Prof. Dr. Hans Kleine Büning 38 University Paderborn Example: Episode 1

39
Reinforcement Learning Prof. Dr. Hans Kleine Büning 39 University Paderborn Example: Q-table

40
Reinforcement Learning Prof. Dr. Hans Kleine Büning 40 University Paderborn Example: Episode 1

41
Reinforcement Learning Prof. Dr. Hans Kleine Büning 41 University Paderborn Episode 1

42
Reinforcement Learning Prof. Dr. Hans Kleine Büning 42 University Paderborn Example: Q-table

43
Reinforcement Learning Prof. Dr. Hans Kleine Büning 43 University Paderborn Example: Episode 2

44
Reinforcement Learning Prof. Dr. Hans Kleine Büning 44 University Paderborn Example: Episode 2

45
Reinforcement Learning Prof. Dr. Hans Kleine Büning 45 University Paderborn Example: Episode 2

46
Reinforcement Learning Prof. Dr. Hans Kleine Büning 46 University Paderborn Example: Q-table after Convergence

47
Reinforcement Learning Prof. Dr. Hans Kleine Büning 47 University Paderborn Example: Value Function after Convergence

48
Reinforcement Learning Prof. Dr. Hans Kleine Büning 48 University Paderborn Example: Optimal Policy

49
Reinforcement Learning Prof. Dr. Hans Kleine Büning 49 University Paderborn Example: Optimal Policy

50
Reinforcement Learning Prof. Dr. Hans Kleine Büning 50 University Paderborn Q-learning

51
Reinforcement Learning Prof. Dr. Hans Kleine Büning 51 University Paderborn Convergence of Q-learning

52
Reinforcement Learning Prof. Dr. Hans Kleine Büning 52 University Paderborn Blackjack Standard rules of blackjack hold State space: –element[0] - current value of player's hand (4-21) –element[1] - value of dealer's face-up card (2-11) –element[2] - player does not have usable ace (0/1) Starting states: –player has any 2 cards (uniformly distributed), dealer has any 1 card (uniformly distributed) Actions: –HIT –STICK Rewards: –1 for a loss –0 for a draw –1 for a win

53
Reinforcement Learning Prof. Dr. Hans Kleine Büning 53 University Paderborn Blackjack: Optimal Policy

54
Reinforcement Learning Prof. Dr. Hans Kleine Büning 54 University Paderborn Reinforcement Learning: Example States –Grids Actions –Left –Up –Right –Down Rewards –Bonus 20 –Food 1 –Predator -10 –Empty grid -0.1 Transition probabilities –0.80 – agent goes where he intends to go –0.20 – to any other adjacent grid or remains where it was (in case he is on the board of the grid world he goes to the other side)

55
Reinforcement Learning Prof. Dr. Hans Kleine Büning 55 University Paderborn Reinforcement Learning: Example

56
Reinforcement Learning Prof. Dr. Hans Kleine Büning 56 University Paderborn Reinforcement Learning: Example

57
Reinforcement Learning Prof. Dr. Hans Kleine Büning 57 University Paderborn Reinforcement Learning: Example

58
Reinforcement Learning Prof. Dr. Hans Kleine Büning 58 University Paderborn Reinforcement Learning: Example

59
Reinforcement Learning Prof. Dr. Hans Kleine Büning 59 University Paderborn Reinforcement Learning: Example

60
Reinforcement Learning Prof. Dr. Hans Kleine Büning 60 University Paderborn Reinforcement Learning: Example

61
Reinforcement Learning Prof. Dr. Hans Kleine Büning 61 University Paderborn Reinforcement Learning: Example

62
Reinforcement Learning Prof. Dr. Hans Kleine Büning 62 University Paderborn Reinforcement Learning: Example

63
Reinforcement Learning Prof. Dr. Hans Kleine Büning 63 University Paderborn Reinforcement Learning: Example

64
Reinforcement Learning Prof. Dr. Hans Kleine Büning 64 University Paderborn Reinforcement Learning: Example

65
Reinforcement Learning Prof. Dr. Hans Kleine Büning 65 University Paderborn Reinforcement Learning: Example

66
Reinforcement Learning Prof. Dr. Hans Kleine Büning 66 University Paderborn Reinforcement Learning: Example

67
Reinforcement Learning Prof. Dr. Hans Kleine Büning 67 University Paderborn Reinforcement Learning: Example

68
Reinforcement Learning Prof. Dr. Hans Kleine Büning 68 University Paderborn Reinforcement Learning: Example

69
Reinforcement Learning Prof. Dr. Hans Kleine Büning 69 University Paderborn Reinforcement Learning: Example

70
Reinforcement Learning Prof. Dr. Hans Kleine Büning 70 University Paderborn Reinforcement Learning: Example

71
Reinforcement Learning Prof. Dr. Hans Kleine Büning 71 University Paderborn Reinforcement Learning: Example

72
Reinforcement Learning Prof. Dr. Hans Kleine Büning 72 University Paderborn Reinforcement Learning: Example

73
Reinforcement Learning Prof. Dr. Hans Kleine Büning 73 University Paderborn Reinforcement Learning: Example

74
Reinforcement Learning Prof. Dr. Hans Kleine Büning 74 University Paderborn Reinforcement Learning: Example

75
Reinforcement Learning Prof. Dr. Hans Kleine Büning 75 University Paderborn Reinforcement Learning: Example

76
Reinforcement Learning Prof. Dr. Hans Kleine Büning 76 University Paderborn Reinforcement Learning: Example

77
Reinforcement Learning Prof. Dr. Hans Kleine Büning 77 University Paderborn Reinforcement Learning: Example

78
Reinforcement Learning Prof. Dr. Hans Kleine Büning 78 University Paderborn Reinforcement Learning: Example

79
Reinforcement Learning Prof. Dr. Hans Kleine Büning 79 University Paderborn Reinforcement Learning: Example

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google