Download presentation

1
**… Hidden Markov Models Markov assumption: Transition model:**

The current state is conditionally independent of all the other past states given the state in the previous time step The evidence at time t depends only on the state at time t Transition model: P(Xt | X0:t-1) = P(Xt | Xt-1) Observation model: P(Et | X0:t, E1:t-1) = P(Et | Xt) … X0 X1 X2 Xt-1 Xt E1 E2 Et-1 Et

2
**An example HMM States: X = {home, office, cafe}**

Observations: E = {sms, facebook, } Slide credit: Andy White

3
**The Joint Distribution**

Transition model: P(Xt | Xt-1) Observation model: P(Et | Xt) How do we compute the full joint P(X0:t, E1:t)? … X0 X1 X2 Xt-1 Xt E1 E2 Et-1 Et

4
HMM inference tasks Filtering: what is the distribution over the current state Xt given all the evidence so far, e1:t ? Query variable … … X0 X1 Xk Xt-1 Xt E1 Ek Et-1 Et Evidence variables

5
HMM inference tasks Filtering: what is the distribution over the current state Xt given all the evidence so far, e1:t ? Smoothing: what is the distribution of some state Xk given the entire observation sequence e1:t? … … X0 X1 Xk Xt-1 Xt E1 Ek Et-1 Et

6
HMM inference tasks Filtering: what is the distribution over the current state Xt given all the evidence so far, e1:t ? Smoothing: what is the distribution of some state Xk given the entire observation sequence e1:t? Evaluation: compute the probability of a given observation sequence e1:t … … X0 X1 Xk Xt-1 Xt E1 Ek Et-1 Et

7
HMM inference tasks Filtering: what is the distribution over the current state Xt given all the evidence so far, e1:t Smoothing: what is the distribution of some state Xk given the entire observation sequence e1:t? Evaluation: compute the probability of a given observation sequence e1:t Decoding: what is the most likely state sequence X0:t given the observation sequence e1:t? … … X0 X1 Xk Xt-1 Xt E1 Ek Et-1 Et

8
Filtering Task: compute the probability distribution over the current state given all the evidence so far: P(Xt | e1:t) Recursive formulation: suppose we know P(Xt-1 | e1:t-1) Query variable … … X0 X1 Xk Xt-1 Xt E1 Ek Et-1 Et Evidence variables

9
Filtering Task: compute the probability distribution over the current state given all the evidence so far: P(Xt | e1:t) Recursive formulation: suppose we know P(Xt-1 | e1:t-1) Time: t – 1 Time: t What is P(Xt = Office | e1:t-1) ? et-1 = Facebook 0.6 * * * 0.1 = 0.5 Home 0.6 Home ?? 0.6 0.2 Office 0.3 Office ?? 0.8 Cafe 0.1 Cafe ?? P(Xt-1 | e1:t-1) P(Xt | Xt-1)

10
Filtering Task: compute the probability distribution over the current state given all the evidence so far: P(Xt | e1:t) Recursive formulation: suppose we know P(Xt-1 | e1:t-1) Time: t – 1 Time: t What is P(Xt = Office | e1:t-1) ? et-1 = Facebook 0.6 * * * 0.1 = 0.5 Home 0.6 Home ?? 0.6 0.2 Office 0.3 Office ?? 0.8 Cafe 0.1 Cafe ?? P(Xt-1 | e1:t-1) P(Xt | Xt-1)

11
Filtering Task: compute the probability distribution over the current state given all the evidence so far: P(Xt | e1:t) Recursive formulation: suppose we know P(Xt-1 | e1:t-1) Time: t – 1 Time: t What is P(Xt = Office | e1:t-1) ? et-1 = Facebook 0.6 * * * 0.1 = 0.5 Home 0.6 Home ?? 0.6 0.2 Office 0.3 Office ?? 0.8 Cafe 0.1 Cafe ?? P(Xt-1 | e1:t-1) P(Xt | Xt-1)

12
Filtering Task: compute the probability distribution over the current state given all the evidence so far: P(Xt | e1:t) Recursive formulation: suppose we know P(Xt-1 | e1:t-1) Time: t – 1 Time: t What is P(Xt = Office | e1:t-1) ? et-1 = Facebook et = 0.6 * * * 0.1 = 0.5 Home 0.6 Home ?? 0.6 0.2 Office 0.3 Office ?? What is P(Xt = Office | e1:t) ? 0.8 Cafe 0.1 Cafe ?? P(Xt-1 | e1:t-1) P(Xt | Xt-1) P(et | Xt) = 0.8

13
Filtering Task: compute the probability distribution over the current state given all the evidence so far: P(Xt | e1:t) Recursive formulation: suppose we know P(Xt-1 | e1:t-1) Time: t – 1 Time: t What is P(Xt = Office | e1:t-1) ? et-1 = Facebook et = 0.6 * * * 0.1 = 0.5 Home 0.6 Home ?? 0.6 0.2 Office 0.3 Office ?? What is P(Xt = Office | e1:t) ? 0.8 Cafe 0.1 Cafe ?? P(Xt-1 | e1:t-1) P(Xt | Xt-1) P(et | Xt) = 0.8

14
Filtering Task: compute the probability distribution over the current state given all the evidence so far: P(Xt | e1:t) Recursive formulation: suppose we know P(Xt-1 | e1:t-1) Time: t – 1 Time: t What is P(Xt = Office | e1:t-1) ? et-1 = Facebook et = 0.6 * * * 0.1 = 0.5 Home 0.6 Home ?? 0.6 0.2 Office 0.3 Office ?? What is P(Xt = Office | e1:t) ? 0.8 Cafe 0.1 Cafe ?? 0.5 * 0.8 = 0.4 Note: must also compute this value for Home and Cafe, and renormalize to sum to 1 P(Xt-1 | e1:t-1) P(Xt | Xt-1) P(et | Xt) = 0.8

15
**Filtering: The Forward Algorithm**

Task: compute the probability distribution over the current state given all the evidence so far: P(Xt | e1:t) Recursive formulation: suppose we know P(Xt-1 | e1:t-1) Base case: priors P(X0) Prediction: propagate belief from Xt-1 to Xt Correction: weight by evidence et Renormalize to have all P(Xt = x | e1:t) sum to 1

16
**Filtering: The Forward Algorithm**

Home Office Cafe Time: t – 1 Time: t et-1 et … prior Time: 0

17
**Evaluation Compute the probability of the current sequence: P(e1:t)**

Recursive formulation: suppose we know P(e1:t-1)

18
**Evaluation Compute the probability of the current sequence: P(e1:t)**

Recursive formulation: suppose we know P(e1:t-1) recursion filtering

19
Smoothing What is the distribution of some state Xk given the entire observation sequence e1:t? … … X0 X1 Xk Xt-1 Xt E1 Ek Et-1 Et

20
Smoothing What is the distribution of some state Xk given the entire observation sequence e1:t? Solution: the forward-backward algorithm Time: 0 Time: k Time: t ek et Home … Home … Home Office … Office … Office Cafe … Cafe … Cafe Forward message: P(Xk | e1:k) Backward message: P(ek+1:t | Xk)

21
**Decoding: Viterbi Algorithm**

Task: given observation sequence e1:t, compute most likely state sequence x0:t … … X0 X1 Xk Xt-1 Xt E1 Ek Et-1 Et

22
**Decoding: Viterbi Algorithm**

Task: given observation sequence e1:t, compute most likely state sequence x0:t The most likely path that ends in a particular state xt consists of the most likely path to some state xt-1 followed by the transition to xt Time: 0 Time: t – 1 Time: t xt-1 xt

23
**Decoding: Viterbi Algorithm**

Let mt(xt) denote the probability of the most likely path that ends in xt: Time: 0 Time: t – 1 Time: t P(xt | xt-1) xt-1 mt-1(xt-1) xt

24
**Learning Given: a training sample of observation sequences**

Goal: compute model parameters Transition probabilities P(Xt | Xt-1) Observation probabilities P(Et | Xt) What if we had complete data, i.e., e1:t and x0:t ? Then we could estimate all the parameters by relative frequencies # of times state b follows state a P(Xt = b | Xt-1= a) total # of transitions from state a # of times e is emitted from state a P(E = e | X = a) total # of emissions from state a

25
**Learning Given: a training sample of observation sequences**

Goal: compute model parameters Transition probabilities P(Xt | Xt-1) Observation probabilities P(Et | Xt) What if we had complete data, i.e., e1:t and x0:t ? Then we could estimate all the parameters by relative frequencies What if we knew the model parameters? Then we could use inference to find the posterior distribution of the hidden states given the observations

26
**Learning Given: a training sample of observation sequences**

Goal: compute model parameters Transition probabilities P(Xt | Xt-1) Observation probabilities P(Et | Xt) The EM (expectation-maximization) algorithm: Starting with a random initialization of parameters: E-step: find the posterior distribution of the hidden variables given observations and current parameter estimate M-step: re-estimate parameter values given the expected values of the hidden variables

Similar presentations

© 2020 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google