Download presentation

Presentation is loading. Please wait.

1
**Hidden Markov Models (HMM) Rabiner’s Paper**

Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

2
**Stationary and Non-stationary**

Stationary Process: Its statistical properties do not vary with time Non-stationary Process: The signal properties vary over time Markoviana Reading Group Fatih Gelgi – Feb, 2005

3
**HMM Example - Casino Coin**

0.9 Two CDF tables 0.2 0.1 Fair Unfair State transition Pbbties. States 0.8 Symbol emission Pbbties. 0.5 0.5 0.3 0.7 Observation Symbols H T H T Observation Sequence HTHHTTHHHTHTHTHHTHHHHHHTHTHH FFFFFFUUUFFFFFFUUUUUUUFFFFFF State Sequence Motivation: Given a sequence of H & Ts, can you tell at what times the casino cheated? Markoviana Reading Group Fatih Gelgi – Feb, 2005

4
**Properties of an HMM First-order Markov process Time is discrete**

qt only depends on qt-1 Time is discrete Markoviana Reading Group Fatih Gelgi – Feb, 2005

5
**Elements of an HMM N, the number of States M, the number of Symbols**

States S1, S2, … SN Observation Symbols O1, O2, … OM l, the Probability Distributions a, b, p Markoviana Reading Group Fatih Gelgi – Feb, 2005

6
HMM Basic Problems Given an observation sequence O=O1O2O3…OT and l, find P(O|l) Forward Algorithm / Backward Algorithm Given O=O1O2O3…OT and l, find most likely state sequence Q=q1q2…qT Viterbi Algorithm Given O=O1O2O3…OT and l, re-estimate l so that P(O|l) is higher than it is now Baum-Welch Re-estimation Markoviana Reading Group Fatih Gelgi – Feb, 2005

7
**Forward Algorithm Illustration**

at(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si. Markoviana Reading Group Fatih Gelgi – Feb, 2005

8
**Forward Algorithm Illustration (cont’d)**

at(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si. Total of this column gives solution State Sj SN pNbN(O1) S (a1(i) aiN) bN(O2) … S6 p6b6(O1) S (a1(i) ai6) b6(O2) S5 p5b5(O1) S (a1(i) ai5) b5(O2) S4 p4b4(O1) S (a1(i) ai4) b4(O2) S3 p3b3(O1) S (a1(i) ai3) b3(O2) S2 p2b2(O1) S (a1(i) ai2) b2(O2) S1 p1b1(O1) S (a1(i) ai1) b1(O2) at(j) O1 O2 O3 O4 OT Observations Ot Markoviana Reading Group Fatih Gelgi – Feb, 2005

9
**Forward Algorithm Definition: Initialization: Induction:**

Problem 1 Answer: at(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si. Complexity: O(N2T) Markoviana Reading Group Fatih Gelgi – Feb, 2005

10
**Backward Algorithm Illustration**

t(i) is the probability of observing a partial sequence Ot+1Ot+2Ot+3…OT such that the state Si. Markoviana Reading Group Fatih Gelgi – Feb, 2005

11
**Backward Algorithm Definition:**

Initialization: Induction: t(i) is the probability of observing a partial sequence Ot+1Ot+2Ot+3…OT such that the state Si. Markoviana Reading Group Fatih Gelgi – Feb, 2005

12
**Q2: Optimality Criterion 1**

* Maximize the expected number of correct individual states Definition: Initialization: Problem 2 Answer: t(i) is the probability of being in state Si at time t given the observation sequence O and the model . Problem: If some aij=0, the optimal state sequence may not even be a valid state sequence. Markoviana Reading Group Fatih Gelgi – Feb, 2005

13
**Q2: Optimality Criterion 2**

* Find the single best state sequence (path), i.e. maximize P(Q|O,). Definition: dt(i) is the highest probability of a state path for the partial observation sequence O1O2O3…Ot such that the state Si. Markoviana Reading Group Fatih Gelgi – Feb, 2005

14
**Viterbi Algorithm The major difference from the forward algorithm:**

Maximization instead of sum Markoviana Reading Group Fatih Gelgi – Feb, 2005

15
**Viterbi Algorithm Illustration**

dt(i) is the highest probability of a state path for the partial observation sequence O1O2O3…Ot such that the state Si. Max of this col indicates traceback start State Sj SN pN bN(O1) max [d1(i) aiN] bN(O2) … S6 p6 b6(O1) max [d1(i) ai6] b6(O2) S5 p5 b5(O1) max [d1(i) ai5] b5(O2) S4 p4 b4(O1) max [d1(i) ai4] b4(O2) S3 p3 b3(O1) max [d1(i) ai3] b3(O2) S2 p2 b2(O1) max [d1(i) ai2] b2(O2) S1 p1 b1(O1) max [d1(i) ai1] b1(O2) dt(j) O1 O2 O3 O4 OT Observations Ot Markoviana Reading Group Fatih Gelgi – Feb, 2005

16
**Relations with DBN Forward Function: Backward Function:**

Viterbi Algorithm: t+1(j) bj(Ot+1) aij t(i) t(i) bj(Ot+1) t+1(j) aij T(i)=1 t+1(j) bj(Ot+1) aij t(i) Markoviana Reading Group Fatih Gelgi – Feb, 2005

17
Some more definitions gt(i) is the probability of being in state Si at time t xt(i,j) is the probability of being in state Si at time t, and Sj at time t+1 Markoviana Reading Group Fatih Gelgi – Feb, 2005

18
**Baum-Welch Re-estimation**

Expectation-Maximization Algorithm Expectation: Markoviana Reading Group Fatih Gelgi – Feb, 2005

19
**Baum-Welch Re-estimation (cont’d)**

Maximization: Markoviana Reading Group Fatih Gelgi – Feb, 2005

20
**Notes on the Re-estimation**

If the model does not change, it means that it has reached a local maxima. Depending on the model, many local maxima can exist Re-estimated probabilities will sum to 1 Markoviana Reading Group Fatih Gelgi – Feb, 2005

21
**Implementation issues**

Scaling Multiple observation sequences Initial parameter estimation Missing data Choice of model size and type Markoviana Reading Group Fatih Gelgi – Feb, 2005

22
**Scaling calculation: Recursion to calculate: Markoviana Reading Group**

Fatih Gelgi – Feb, 2005

23
**Scaling (cont’d) calculation: Desired condition:**

* Note that is not true! Markoviana Reading Group Fatih Gelgi – Feb, 2005

24
Scaling (cont’d) Markoviana Reading Group Fatih Gelgi – Feb, 2005

25
**Maximum log-likelihood**

Initialization: Recursion: Termination: Markoviana Reading Group Fatih Gelgi – Feb, 2005

26
**Multiple observations sequences**

Problem with re-estimation Markoviana Reading Group Fatih Gelgi – Feb, 2005

27
**Initial estimates of parameters**

For and A, Random or uniform is sufficient For B (discrete symbol prb.), Good initial estimate is needed Markoviana Reading Group Fatih Gelgi – Feb, 2005

28
**Insufficient training data**

Solutions: Increase the size of training data Reduce the size of the model Interpolate parameters using another model Markoviana Reading Group Fatih Gelgi – Feb, 2005

29
References L Rabiner. ‘A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.’ Proceedings of the IEEE 1989. S Russell, P Norvig. ‘Probabilistic Reasoning Over Time’. AI: A Modern Approach, Ch.15, 2002 (draft). V Borkar, K Deshmukh, S Sarawagi. ‘Automatic segmentation of text into structured records.’ ACM SIGMOD 2001. T Scheffer, C Decomain, S Wrobel. ‘Active Hidden Markov Models for Information Extraction.’ Proceedings of the International Symposium on Intelligent Data Analysis 2001. S Ray, M Craven. ‘Representing Sentence Structure in Hidden Markov Models for Information Extraction.’ Proceedings of the 17th International Joint Conference on Artificial Intelligence 2001. Markoviana Reading Group Fatih Gelgi – Feb, 2005

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google