Reinforcement Learning 2018-02-04
Overview Tabular Methods Approximate Methods Deep Reinforcement Learning
Tabular Methods
Model: Mathematical models of dynamics and reward Policy: function mapping agent’s states to action Value function: future rewards from being in a state and/or action when following a particular policy
MDP
Markov Reward Process
Markov Reward Process
MDP = MRP + Action
MDP + Policy
Compare
How to Control?
Policy Search
State-Action Value Q
Policy Iteration
Worst Case Policy Iteration Can Take At Most |A|^|S| Iterations* (Size of # Policies)
Value Iteration