Download presentation

Presentation is loading. Please wait.

1
**Hidden Markov Models (HMM) Rabiner’s Paper**

Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

2
**Stationary and Non-stationary**

Stationary Process: Its statistical properties do not vary with time Non-stationary Process: The signal properties vary over time Markoviana Reading Group Fatih Gelgi – Feb, 2005

3
**HMM Example - Casino Coin**

0.9 Two CDF tables 0.2 0.1 Fair Unfair State transition Pbbties. States 0.8 Symbol emission Pbbties. 0.5 0.5 0.3 0.7 Observation Symbols H T H T Observation Sequence HTHHTTHHHTHTHTHHTHHHHHHTHTHH FFFFFFUUUFFFFFFUUUUUUUFFFFFF State Sequence Motivation: Given a sequence of H & Ts, can you tell at what times the casino cheated? Markoviana Reading Group Fatih Gelgi – Feb, 2005

4
**Properties of an HMM First-order Markov process Time is discrete**

qt only depends on qt-1 Time is discrete Markoviana Reading Group Fatih Gelgi – Feb, 2005

5
**Elements of an HMM N, the number of States M, the number of Symbols**

States S1, S2, … SN Observation Symbols O1, O2, … OM l, the Probability Distributions a, b, p Markoviana Reading Group Fatih Gelgi – Feb, 2005

6
HMM Basic Problems Given an observation sequence O=O1O2O3…OT and l, find P(O|l) Forward Algorithm / Backward Algorithm Given O=O1O2O3…OT and l, find most likely state sequence Q=q1q2…qT Viterbi Algorithm Given O=O1O2O3…OT and l, re-estimate l so that P(O|l) is higher than it is now Baum-Welch Re-estimation Markoviana Reading Group Fatih Gelgi – Feb, 2005

7
**Forward Algorithm Illustration**

at(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si. Markoviana Reading Group Fatih Gelgi – Feb, 2005

8
**Forward Algorithm Illustration (cont’d)**

at(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si. Total of this column gives solution State Sj SN pNbN(O1) S (a1(i) aiN) bN(O2) … S6 p6b6(O1) S (a1(i) ai6) b6(O2) S5 p5b5(O1) S (a1(i) ai5) b5(O2) S4 p4b4(O1) S (a1(i) ai4) b4(O2) S3 p3b3(O1) S (a1(i) ai3) b3(O2) S2 p2b2(O1) S (a1(i) ai2) b2(O2) S1 p1b1(O1) S (a1(i) ai1) b1(O2) at(j) O1 O2 O3 O4 OT Observations Ot Markoviana Reading Group Fatih Gelgi – Feb, 2005

9
**Forward Algorithm Definition: Initialization: Induction:**

Problem 1 Answer: at(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si. Complexity: O(N2T) Markoviana Reading Group Fatih Gelgi – Feb, 2005

10
**Backward Algorithm Illustration**

t(i) is the probability of observing a partial sequence Ot+1Ot+2Ot+3…OT such that the state Si. Markoviana Reading Group Fatih Gelgi – Feb, 2005

11
**Backward Algorithm Definition:**

Initialization: Induction: t(i) is the probability of observing a partial sequence Ot+1Ot+2Ot+3…OT such that the state Si. Markoviana Reading Group Fatih Gelgi – Feb, 2005

12
**Q2: Optimality Criterion 1**

* Maximize the expected number of correct individual states Definition: Initialization: Problem 2 Answer: t(i) is the probability of being in state Si at time t given the observation sequence O and the model . Problem: If some aij=0, the optimal state sequence may not even be a valid state sequence. Markoviana Reading Group Fatih Gelgi – Feb, 2005

13
**Q2: Optimality Criterion 2**

* Find the single best state sequence (path), i.e. maximize P(Q|O,). Definition: dt(i) is the highest probability of a state path for the partial observation sequence O1O2O3…Ot such that the state Si. Markoviana Reading Group Fatih Gelgi – Feb, 2005

14
**Viterbi Algorithm The major difference from the forward algorithm:**

Maximization instead of sum Markoviana Reading Group Fatih Gelgi – Feb, 2005

15
**Viterbi Algorithm Illustration**

dt(i) is the highest probability of a state path for the partial observation sequence O1O2O3…Ot such that the state Si. Max of this col indicates traceback start State Sj SN pN bN(O1) max [d1(i) aiN] bN(O2) … S6 p6 b6(O1) max [d1(i) ai6] b6(O2) S5 p5 b5(O1) max [d1(i) ai5] b5(O2) S4 p4 b4(O1) max [d1(i) ai4] b4(O2) S3 p3 b3(O1) max [d1(i) ai3] b3(O2) S2 p2 b2(O1) max [d1(i) ai2] b2(O2) S1 p1 b1(O1) max [d1(i) ai1] b1(O2) dt(j) O1 O2 O3 O4 OT Observations Ot Markoviana Reading Group Fatih Gelgi – Feb, 2005

16
**Relations with DBN Forward Function: Backward Function:**

Viterbi Algorithm: t+1(j) bj(Ot+1) aij t(i) t(i) bj(Ot+1) t+1(j) aij T(i)=1 t+1(j) bj(Ot+1) aij t(i) Markoviana Reading Group Fatih Gelgi – Feb, 2005

17
Some more definitions gt(i) is the probability of being in state Si at time t xt(i,j) is the probability of being in state Si at time t, and Sj at time t+1 Markoviana Reading Group Fatih Gelgi – Feb, 2005

18
**Baum-Welch Re-estimation**

Expectation-Maximization Algorithm Expectation: Markoviana Reading Group Fatih Gelgi – Feb, 2005

19
**Baum-Welch Re-estimation (cont’d)**

Maximization: Markoviana Reading Group Fatih Gelgi – Feb, 2005

20
**Notes on the Re-estimation**

If the model does not change, it means that it has reached a local maxima. Depending on the model, many local maxima can exist Re-estimated probabilities will sum to 1 Markoviana Reading Group Fatih Gelgi – Feb, 2005

21
**Implementation issues**

Scaling Multiple observation sequences Initial parameter estimation Missing data Choice of model size and type Markoviana Reading Group Fatih Gelgi – Feb, 2005

22
**Scaling calculation: Recursion to calculate: Markoviana Reading Group**

Fatih Gelgi – Feb, 2005

23
**Scaling (cont’d) calculation: Desired condition:**

* Note that is not true! Markoviana Reading Group Fatih Gelgi – Feb, 2005

24
Scaling (cont’d) Markoviana Reading Group Fatih Gelgi – Feb, 2005

25
**Maximum log-likelihood**

Initialization: Recursion: Termination: Markoviana Reading Group Fatih Gelgi – Feb, 2005

26
**Multiple observations sequences**

Problem with re-estimation Markoviana Reading Group Fatih Gelgi – Feb, 2005

27
**Initial estimates of parameters**

For and A, Random or uniform is sufficient For B (discrete symbol prb.), Good initial estimate is needed Markoviana Reading Group Fatih Gelgi – Feb, 2005

28
**Insufficient training data**

Solutions: Increase the size of training data Reduce the size of the model Interpolate parameters using another model Markoviana Reading Group Fatih Gelgi – Feb, 2005

29
References L Rabiner. ‘A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.’ Proceedings of the IEEE 1989. S Russell, P Norvig. ‘Probabilistic Reasoning Over Time’. AI: A Modern Approach, Ch.15, 2002 (draft). V Borkar, K Deshmukh, S Sarawagi. ‘Automatic segmentation of text into structured records.’ ACM SIGMOD 2001. T Scheffer, C Decomain, S Wrobel. ‘Active Hidden Markov Models for Information Extraction.’ Proceedings of the International Symposium on Intelligent Data Analysis 2001. S Ray, M Craven. ‘Representing Sentence Structure in Hidden Markov Models for Information Extraction.’ Proceedings of the 17th International Joint Conference on Artificial Intelligence 2001. Markoviana Reading Group Fatih Gelgi – Feb, 2005

Similar presentations

OK

Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google