Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hidden Markov Models (cont.) Markov Decision Processes

Similar presentations


Presentation on theme: "Hidden Markov Models (cont.) Markov Decision Processes"— Presentation transcript:

1 Hidden Markov Models (cont.) Markov Decision Processes
CHAPTER 9 Hidden Markov Models (cont.) Markov Decision Processes

2 Markov Models

3 Conditional Independence

4 Weather Example

5 Mini-Forward Algorithm

6 Example

7 Stationary Distributions
If we simulate the chain long enough:  What happens?  Uncertainty accumulates  Eventually, we have no idea what the state is! Stationary distributions:  For most chains, the distribution we end up in is independent of the initial distribution  Called the stationary distribution of the chain  Usually, can only predict a short time out

8 Example: Web Link Analysis

9 Mini-Viterbi Algorithm

10 Hidden Markov Models

11 HMM Applications

12 Filtering: Forward Algorithm

13 Filtering Example

14 MLE: Viterbi Algorithm

15 Viterbi Properties

16 Markov Decision Processes

17 MDP Solutions

18 Example Optimal Policies

19 Stationarity

20 How (Not) to Solve an MDP
The inefficient way:  Enumerate policies  Calculate the expected utility (discounte rewards) starting from the start state  E.g. by simulating a bunch of runs  Choose the best policy We’ll return to a (better) idea like this later

21 Utilities of States

22 Infinite Utilities?

23 The Bellman Equation

24 Example: Bellman Equations

25 Value Iteration

26 Policy Iteration Alternate approach:
 Policy evaluation: calculate utilities for a fixed policy  Policy improvement: update policy based on resulting utilities  Repeat until convergence This is policy iteration  Can converge faster under some conditions

27 Comparison In value iteration:
 Every pass (or “backup”) updates both policy (based on current utilities) and utilities (based on current policy In policy iteration:  Several passes to update utilities  Occasional passes to update policies Hybrid approaches (asynchronous policy iteration):  Any sequences of partial updates to either policy entries or utilities will converge if every state is visited infinitely often


Download ppt "Hidden Markov Models (cont.) Markov Decision Processes"

Similar presentations


Ads by Google