Presentation is loading. Please wait.

Presentation is loading. Please wait.

University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning.

Similar presentations


Presentation on theme: "University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning."— Presentation transcript:

1 University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning

2 Reinforcement Learning Prof. Dr. Hans Kleine Büning 2 University Paderborn Outline Motivation Applications Markov Decision Processes Q-learning Examples

3 Reinforcement Learning Prof. Dr. Hans Kleine Büning 3 University Paderborn

4 Reinforcement Learning Prof. Dr. Hans Kleine Büning 4 University Paderborn Reinforcement Learning: The Idea A way of programming agents by reward and punishment without specifying how the task is to be achieved

5 Reinforcement Learning Prof. Dr. Hans Kleine Büning 5 University Paderborn Learning to Ride a Bicycle Environment stat e action

6 Reinforcement Learning Prof. Dr. Hans Kleine Büning 6 University Paderborn Learning to Ride a Bicycle States: –Angle of handle bars –Angular velocity of handle bars –Angle of bicycle to vertical –Angular velocity of bicycle to vertical –Acceleration of angle of bicycle to vertical

7 Reinforcement Learning Prof. Dr. Hans Kleine Büning 7 University Paderborn Learning to Ride a Bicycle Environment stat e action

8 Reinforcement Learning Prof. Dr. Hans Kleine Büning 8 University Paderborn Learning to Ride a Bicycle Actions: –Torque to be applied to the handle bars –Displacement of the center of mass from the bicycles plan (in cm)

9 Reinforcement Learning Prof. Dr. Hans Kleine Büning 9 University Paderborn Learning to Ride a Bicycle Environment stat e action

10 Reinforcement Learning Prof. Dr. Hans Kleine Büning 10 University Paderborn Angle of bicycle to vertical is greater than 12° Reward = 0 Reward = -1 no yes

11 Reinforcement Learning Prof. Dr. Hans Kleine Büning 11 University Paderborn Learning To Ride a Bicycle Reinforcement Learning

12 Reinforcement Learning Prof. Dr. Hans Kleine Büning 12 University Paderborn Reinforcement Learning: Applications Board Games –TD-Gammon program, based on reinforcement learning, has become a world-class backgammon player Mobile Robot Controlling –Learning to ride a Bicycle –Navigation –Pole-balancing –Acrobot Sequential Process Controlling –Elevator dispatching

13 Reinforcement Learning Prof. Dr. Hans Kleine Büning 13 University Paderborn History of Reinforcement Learning Trial and error learning in psychology of animal learning Optimal control and dynamic programming Temporal-difference methods

14 Reinforcement Learning Prof. Dr. Hans Kleine Büning 14 University Paderborn Key Features of Reinforcement Learning Learner is not told which actions to take Trial and error search Possibility of delayed reward: –Sacrifice of short-term gains for greater long-term gains Explore/Exploit trade-off Considers the whole problem of a goal-directed agent interaction with an uncertain environment

15 Reinforcement Learning Prof. Dr. Hans Kleine Büning 15 University Paderborn The Agent-Environment Interaction Agent and environment interact at discrete time steps: t = 0,1, 2, … –Agent observes state at step t : s t 2 S –produces action at step t: a t 2 A –gets resulting reward : r t +1 2 –and resulting next state: s t +1 2 S

16 Reinforcement Learning Prof. Dr. Hans Kleine Büning 16 University Paderborn The Agents Goal: Coarsely, the agents goal is to get as much reward as it can over the long run Policy is a mapping from states to action s) = a Reinforcement learning methods specify how the agent changes its policy as a result of experience

17 Reinforcement Learning Prof. Dr. Hans Kleine Büning 17 University Paderborn Deterministic Markov Decision Process

18 Reinforcement Learning Prof. Dr. Hans Kleine Büning 18 University Paderborn Example

19 Reinforcement Learning Prof. Dr. Hans Kleine Büning 19 University Paderborn Example: Corresponding MDP

20 Reinforcement Learning Prof. Dr. Hans Kleine Büning 20 University Paderborn Example: Corresponding MDP

21 Reinforcement Learning Prof. Dr. Hans Kleine Büning 21 University Paderborn Example: Corresponding MDP

22 Reinforcement Learning Prof. Dr. Hans Kleine Büning 22 University Paderborn Example: Policy

23 Reinforcement Learning Prof. Dr. Hans Kleine Büning 23 University Paderborn Value of Policy and Rewards

24 Reinforcement Learning Prof. Dr. Hans Kleine Büning 24 University Paderborn Value of Policy and Agents Task

25 Reinforcement Learning Prof. Dr. Hans Kleine Büning 25 University Paderborn Nondeterministic Markov Decision Process P = 0.8 P = 0.1

26 Reinforcement Learning Prof. Dr. Hans Kleine Büning 26 University Paderborn Nondeterministic Markov Decision Process

27 Reinforcement Learning Prof. Dr. Hans Kleine Büning 27 University Paderborn Nondeterministic Markov Decision Process

28 Reinforcement Learning Prof. Dr. Hans Kleine Büning 28 University Paderborn Example with South-Easten Wind

29 Reinforcement Learning Prof. Dr. Hans Kleine Büning 29 University Paderborn Example with South-Easten Wind

30 Reinforcement Learning Prof. Dr. Hans Kleine Büning 30 University Paderborn Methods Dynamic Programming Value Function Approximation + Dynamic Programming Reinforcement Learning (Q-learning, Monte Carlo Methods) Value Function Approximation + Reinforcement Learning continuous states discrete states continuous states Model (reward function and transition probabilities) is known Model (reward function or transition probabilities) is unknown

31 Reinforcement Learning Prof. Dr. Hans Kleine Büning 31 University Paderborn Q-learning Algorithm

32 Reinforcement Learning Prof. Dr. Hans Kleine Büning 32 University Paderborn Q-learning Algorithm

33 Reinforcement Learning Prof. Dr. Hans Kleine Büning 33 University Paderborn Example

34 Reinforcement Learning Prof. Dr. Hans Kleine Büning 34 University Paderborn Example: Q-table Initialization

35 Reinforcement Learning Prof. Dr. Hans Kleine Büning 35 University Paderborn Example: Episode 1

36 Reinforcement Learning Prof. Dr. Hans Kleine Büning 36 University Paderborn Example: Episode 1

37 Reinforcement Learning Prof. Dr. Hans Kleine Büning 37 University Paderborn Example: Episode 1

38 Reinforcement Learning Prof. Dr. Hans Kleine Büning 38 University Paderborn Example: Episode 1

39 Reinforcement Learning Prof. Dr. Hans Kleine Büning 39 University Paderborn Example: Episode 1

40 Reinforcement Learning Prof. Dr. Hans Kleine Büning 40 University Paderborn Example: Q-table

41 Reinforcement Learning Prof. Dr. Hans Kleine Büning 41 University Paderborn Example: Episode 1

42 Reinforcement Learning Prof. Dr. Hans Kleine Büning 42 University Paderborn Episode 1

43 Reinforcement Learning Prof. Dr. Hans Kleine Büning 43 University Paderborn Example: Q-table

44 Reinforcement Learning Prof. Dr. Hans Kleine Büning 44 University Paderborn Example: Episode 2

45 Reinforcement Learning Prof. Dr. Hans Kleine Büning 45 University Paderborn Example: Episode 2

46 Reinforcement Learning Prof. Dr. Hans Kleine Büning 46 University Paderborn Example: Episode 2

47 Reinforcement Learning Prof. Dr. Hans Kleine Büning 47 University Paderborn Example: Q-table after Convergence

48 Reinforcement Learning Prof. Dr. Hans Kleine Büning 48 University Paderborn Example: Value Function after Convergence

49 Reinforcement Learning Prof. Dr. Hans Kleine Büning 49 University Paderborn Example: Optimal Policy

50 Reinforcement Learning Prof. Dr. Hans Kleine Büning 50 University Paderborn Example: Optimal Policy

51 Reinforcement Learning Prof. Dr. Hans Kleine Büning 51 University Paderborn Q-learning

52 Reinforcement Learning Prof. Dr. Hans Kleine Büning 52 University Paderborn Convergence of Q-learning

53 Reinforcement Learning Prof. Dr. Hans Kleine Büning 53 University Paderborn Blackjack Standard rules of blackjack hold State space: –element[0] - current value of player's hand (4-21) –element[1] - value of dealer's face­-up card (2-11) –element[2] - player does not have usable ace (0/1) Starting states: –player has any 2 cards (uniformly distributed), dealer has any 1 card (uniformly distributed) Actions: –HIT –STICK Rewards: –­1 for a loss –0 for a draw –1 for a win

54 Reinforcement Learning Prof. Dr. Hans Kleine Büning 54 University Paderborn Blackjack: Optimal Policy

55 Reinforcement Learning Prof. Dr. Hans Kleine Büning 55 University Paderborn Reinforcement Learning: Example States –Grids Actions –Left –Up –Right –Down Rewards –Bonus 20 –Food 1 –Predator -10 –Empty grid -0.1 Transition probabilities –0.80 – agent goes where he intends to go –0.20 – to any other adjacent grid or remains where it was (in case he is on the board of the grid world he goes to the other side)

56 Reinforcement Learning Prof. Dr. Hans Kleine Büning 56 University Paderborn Reinforcement Learning: Example

57 Reinforcement Learning Prof. Dr. Hans Kleine Büning 57 University Paderborn Reinforcement Learning: Example

58 Reinforcement Learning Prof. Dr. Hans Kleine Büning 58 University Paderborn Reinforcement Learning: Example

59 Reinforcement Learning Prof. Dr. Hans Kleine Büning 59 University Paderborn Reinforcement Learning: Example

60 Reinforcement Learning Prof. Dr. Hans Kleine Büning 60 University Paderborn Reinforcement Learning: Example

61 Reinforcement Learning Prof. Dr. Hans Kleine Büning 61 University Paderborn Reinforcement Learning: Example

62 Reinforcement Learning Prof. Dr. Hans Kleine Büning 62 University Paderborn Reinforcement Learning: Example

63 Reinforcement Learning Prof. Dr. Hans Kleine Büning 63 University Paderborn Reinforcement Learning: Example

64 Reinforcement Learning Prof. Dr. Hans Kleine Büning 64 University Paderborn Reinforcement Learning: Example

65 Reinforcement Learning Prof. Dr. Hans Kleine Büning 65 University Paderborn Reinforcement Learning: Example

66 Reinforcement Learning Prof. Dr. Hans Kleine Büning 66 University Paderborn Reinforcement Learning: Example

67 Reinforcement Learning Prof. Dr. Hans Kleine Büning 67 University Paderborn Reinforcement Learning: Example

68 Reinforcement Learning Prof. Dr. Hans Kleine Büning 68 University Paderborn Reinforcement Learning: Example

69 Reinforcement Learning Prof. Dr. Hans Kleine Büning 69 University Paderborn Reinforcement Learning: Example

70 Reinforcement Learning Prof. Dr. Hans Kleine Büning 70 University Paderborn Reinforcement Learning: Example

71 Reinforcement Learning Prof. Dr. Hans Kleine Büning 71 University Paderborn Reinforcement Learning: Example

72 Reinforcement Learning Prof. Dr. Hans Kleine Büning 72 University Paderborn Reinforcement Learning: Example

73 Reinforcement Learning Prof. Dr. Hans Kleine Büning 73 University Paderborn Reinforcement Learning: Example

74 Reinforcement Learning Prof. Dr. Hans Kleine Büning 74 University Paderborn Reinforcement Learning: Example

75 Reinforcement Learning Prof. Dr. Hans Kleine Büning 75 University Paderborn Reinforcement Learning: Example

76 Reinforcement Learning Prof. Dr. Hans Kleine Büning 76 University Paderborn Reinforcement Learning: Example

77 Reinforcement Learning Prof. Dr. Hans Kleine Büning 77 University Paderborn Reinforcement Learning: Example

78 Reinforcement Learning Prof. Dr. Hans Kleine Büning 78 University Paderborn Reinforcement Learning: Example

79 Reinforcement Learning Prof. Dr. Hans Kleine Büning 79 University Paderborn Reinforcement Learning: Example

80 Reinforcement Learning Prof. Dr. Hans Kleine Büning 80 University Paderborn Reinforcement Learning: Example


Download ppt "University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning."

Similar presentations


Ads by Google