Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Reinforcement Learning and Q-Learning

Similar presentations


Presentation on theme: "Introduction to Reinforcement Learning and Q-Learning"— Presentation transcript:

1 Introduction to Reinforcement Learning and Q-Learning
Andrew L. Nelson Visiting Research Faculty University of South Florida 2/28/2019 Q-Learning

2 Overview Outline to the left in green Current topic in yellow
References Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary Outline to the left in green Current topic in yellow References Introduction Learning an optimal policy in a known environment Learning an approximate optimal policy in an unknown environment Example Generalization and representation Knowledge based vs general function approximation methods 2/28/2019 Q-Learning

3 References Overview References Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary C. Watkins, P. Dayan, “Q-Learning,” Machine Learning, vol. 8, pp , 1989. T.M. Mitchell, Machine Learning, WCB/McGraw-Hill, 1997. 2/28/2019 Q-Learning

4 Introduction Situated Learning Agents
The Goal of a leaning agent is to learn to choose actions (a) so that the net reward over a sequence of actions is maximized Supervised learning methods make use of knowledge of the world and of known reward functions Reinforcement learning methods use rewards to learn an optimal policy in a given (unknown) environment Overview References Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary 2/28/2019 Q-Learning

5 Agent and Environment An agent produces an action (a), and receives a reward (and changes the state, s) from a given environment Overview References Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary 2/28/2019 Q-Learning

6 Nomenclature Action: a  A. State: s  S. Reward: r = R(s)
Overview References Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary Action: a  A. State: s  S. Reward: r = R(s) Policy: π: A → S Optimal Policy: π * World Model: s' = T(s, a) Utility: U(s) Value: Q(a, s) 2/28/2019 Q-Learning

7 Cell World Agent States Transitions Reward Introduction Overview
References Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary Agent States Transitions Reward 2/28/2019 Q-Learning

8 Learning π* in Known Environments
The supervised method: Find the maximum possible utility for each state (Iterative search) learn the optimal policy π*: A → S by learning the action associated with each state s that leads to the next state s' with maximum possible utility, U* Requirements: Known world model, T(s, a) Known reward function, R(s) Overview References Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary 2/28/2019 Q-Learning

9 Known Rewards and Transitions
R(s) and s' = T(s, a) known for all s  S and a  A References Overview Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary 2/28/2019 Q-Learning

10 Calculate U* for Each State
(Using an iterative search algorithm, for example) References Overview Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary 2/28/2019 Q-Learning

11 Calculate π* using the known U* values
References Overview Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary π*: U*(s), for all s 2/28/2019 Q-Learning

12 Notes Overview References Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary Supervised learning methods work well when a complete model of the environment and the reward function are known Since R(s) and T(s, a) are known, we can reduce learning to a standard iterative learning process. 2/28/2019 Q-Learning

13 Unknown Environments What if the environment is unknown? Overview
References Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary What if the environment is unknown? 2/28/2019 Q-Learning

14 Policy Learning in known space
Overview References Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary 2/28/2019 Q-Learning

15 The Q-Function Overview References Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary Instead of learning utilities, action-state values (Q) will be learned U(s) = maxaQ(s, a) Local action and exploration can be used to discover and learn Q(s, a) values in an unknown environment We will use the following equation: Q(s, a) ← r + maxa' Q(s', a') 2/28/2019 Q-Learning

16 The Q-Learning Algorithm
Build up a table of Q(s, a) values as follows: Do forever: From the current state s Set each un-initialized state-action Q(s, a) value to 0 and add it to table of Q values With probability p, Select action a with maximum Q value (otherwise select a at random) Execute a and receive immediate reward r. Update the table entry for Q(s, a) as Q(s, a) ← r + maxa' Q(s', a') s ← s' Overview References Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary 2/28/2019 Q-Learning

17 Q-Learning Example Initialize table and first position Overview
References Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary Initialize table and first position 2/28/2019 Q-Learning

18 Q-Learning Example Move to s'... iterate Overview References
Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary Move to s'... iterate 2/28/2019 Q-Learning

19 Q-Learning Example Continue Overview References Introduction
Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary Continue 2/28/2019 Q-Learning

20 Q-Learning Example Terminal state, start over Overview References
Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary Terminal state, start over 2/28/2019 Q-Learning

21 Q-Learning Example Starting new iteration Overview References
Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary Starting new iteration 2/28/2019 Q-Learning

22 Q-Learning Example After a few more iterations... Overview References
Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary After a few more iterations... 2/28/2019 Q-Learning

23 Representation and Generalization
Overview References Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary Policies learned using state transition representations do not generalize to un-visited stated. Functional representations allow for generalization to states not explored f(s) = p1a + p2a2 + p3a3 ... Functional representations might cover search spaces that do not contain the target policy. 2/28/2019 Q-Learning

24 Summary Reinforcement learning (RL) is useful for learning policies in un-characterized environments RL uses reward from actions taken during exploration RL is useful on small state transition spaces Functional representations increase the power of RL both in terms of generalization and representation Overview References Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning Q-Function Q-Algorithm Generalization Summary 2/28/2019 Q-Learning


Download ppt "Introduction to Reinforcement Learning and Q-Learning"

Similar presentations


Ads by Google