1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.

1 Hidden Markov Models Hsin-min Wang whm@iis.sinica.edu.tw References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter 6 2.X. Huang et. al., (2001) Spoken Language Processing, Chapter 8 3.L. R. Rabiner, (1989) “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, No. 2, February 1989

2 Hidden Markov Model (HMM) History –Published in Baum’s papers in late 1960s and early 1970s –Introduced to speech processing by Baker (CMU) and Jelinek (IBM) in the 1970s –Introduced to DNA sequencing in 1990s? Assumption –Speech signal (DNA sequence) can be characterized as a parametric random process –Parameters can be estimated in a precise, well-defined manner Three fundamental problems –Evaluation of probability (likelihood) of a sequence of observations given a specific HMM –Determination of a best sequence of model states –Adjustment of model parameters so as to best account for observed signal/sequence

3 Hidden Markov Model (HMM) S2S2 S1S1 S3S3 {A:.34,B:.33,C:.33} {A:.33,B:.34,C:.33}{A:.33,B:.33,C:.34} 0.34 0.33 0.34 Given an initial model as follows: We can train HMMs for the following two classes using their training data respectively. Training set for class 1: 1. ABBCABCAABC 2. ABCABC 3. ABCA ABC 4. BBABCAB 5. BCAABCCAB 6. CACCABCA 7. CABCABCA 8. CABCA 9. CABCA Training set for class 2: 1. BBBCCBC 2. CCBABB 3. AACCBBB 4. BBABBAC 5. CCAABBAB 6. BBBCCBAA 7. ABBBBABA 8. CCCCC 9. BBAAA We can then decide which class do the following testing sequences belong to? ABCABCCAB AABABCCCCBBB back

4 An Observable Markov Model –The parameters of a Markov chain, with N states labeled by {1,…,N} and the state at time t in the Markov chain denoted as q t, can be described as a ij =P(q t = j|q t-1 =i) 1≤i,j≤N  i =P(q 1 =i) 1≤i≤N –The output of the process is the set of states at each time instant t, where each state corresponds to an observable event X i –There is one-to-one correspondence between the observable sequence and the Markov chain state sequence (observation is deterministic!) The Markov Chain (Rabiner 1989) First-order Markov chain

5 The Markov Chain – Ex 1 Example 1 : a 3-state Markov Chain –State 1 generates symbol A only, State 2 generates symbol B only, State 3 generates symbol C only –Given a sequence of observed symbols O={CABBCABC}, the only one corresponding state sequence is Q={S 3 S 1 S 2 S 2 S 3 S 1 S 2 S 3 }, and the corresponding probability is P(O| )=P(CABBCABC| )=P(Q| )=P(S 3 S 1 S 2 S 2 S 3 S 1 S 2 S 3 | ) =π(S 3 )P(S 1 |S 3 )P(S 2 |S 1 )P(S 2 |S 2 )P(S 3 |S 2 )P(S 1 |S 3 )P(S 2 |S 1 )P(S 3 |S 2 ) =0.1  0.3  0.3  0.7  0.2  0.3  0.3  0.2=0.00002268 S2S2 S3S3 A B C 0.6 0.7 0.3 0.1 0.2 0.1 0.3 0.5 S1S1

6 The Markov Chain – Ex 2 Example 2: A three-state Markov chain for the Dow Jones Industrial average The probability of 5 consecutive up days (Huang et al., 2001)

7 Extension to Hidden Markov Models HMM: an extended version of Observable Markov Model –The observation is a probabilistic function (discrete or continuous) of a state instead of an one-to-one correspondence of a state –The model is a doubly embedded stochastic process with an underlying stochastic process that is not directly observable (hidden) What is hidden? The State Sequence! According to the observation sequence, we are not sure which state sequence generates it!

8 Hidden Markov Models – Ex 1 Example : a 3-state discrete HMM –Given a sequence of observations O={ABC}, there are 27 possible corresponding state sequences, and therefore the probability, P(O| ), is S2S2 S1S1 S3S3 {A:.3,B:.2,C:.5} {A:.7,B:.1,C:.2}{A:.3,B:.6,C:.1} 0.6 0.7 0.3 0.1 0.2 0.1 0.3 0.5 Initial model

9 Hidden Markov Models – Ex 2 (Huang et al., 2001) Given a three-state Hidden Markov Model for the Dow Jones Industrial average as follows: How to find the probability P(up, up, up, up, up| )? How to find the optimal state sequence of the model which generates the observation sequence “ up, up, up, up, up ”?

10 Elements of an HMM An HMM is characterized by the following: 1. N, the number of states in the model 2. M, the number of distinct observation symbols per state 3.The state transition probability distribution A={a ij }, where a ij =P[q t+1 =j|q t =i], 1≤i,j≤N 4.The observation symbol probability distribution in state j, B={b j (v k )}, where b j (v k )=P[o t =v k |q t =j], 1≤j≤N, 1≤k≤M 5.The initial state distribution  ={  i }, where  i =P[q 1 =i], 1≤i≤N For convenience, we usually use a compact notation =(A,B,  ) to indicate the complete parameter set of an HMM –Requires specification of two model parameters ( N and M )

11 Two Major Assumptions for HMM First-order Markov assumption First-order Markov assumption –The state transition depends only on the origin and destination –The state transition probability is time invariant Output-independent assumption Output-independent assumption –The observation is dependent on the state that generates it, not dependent on its neighbor observations a ij =P(q t+1 =j|q t =i), 1≤i, j≤N

12 Three Basic Problems for HMMs Given an observation sequence O=(o 1,o 2,…,o T ), and an HMM =(A,B,  ) –Problem 1: How to efficiently compute P(O| ) ?  Evaluation problem –Problem 2: How to choose an optimal state sequence Q=(q 1,q 2,……, q T ) which best explains the observations?  Decoding Problem –Problem 3: How to adjust the model parameters =(A,B,  ) to maximize P(O| ) ?  Learning/Training Problem

13 Solution to Problem 1 - Direct Evaluation Given O and, find P(O| )= Pr{observing O given } Evaluating all possible state sequences of length T that generate observation sequence O : The probability of the path Q –By first-order Markov assumption : The joint output probability along the path Q –By output-independent assumption

14 Solution to Problem 1 - Direct Evaluation (cont.) S2S2 S3S3 S1S1 o1o1 S2S2 S3S3 S1S1 S2S2 S3S3 S1S1 S2S2 S3S3 S1S1 State o2o2 o3o3 oToT 1 2 3 T-1 T Time S2S2 S3S3 S1S1 o T-1 SiSi means b j (o t ) has been computed a ij means a ij has been computed

15 Solution to Problem 1 - Direct Evaluation (cont.) –Huge Computation Requirements: O(N T ) ( N T state sequences) Exponential computational complexity A more efficient algorithm can be used to evaluate –The Forward Procedure/Algorithm

16 Solution to Problem 1 - The Forward Procedure Base on the HMM assumptions, the calculation of and involves only q t-1, q t, and o t, so it is possible to compute the likelihood with recursion on t Forward variable : –The probability of the joint event that o 1,o 2,…,o t are observed and the state at time t is i, given the model λ

17 Solution to Problem 1 - The Forward Procedure (cont.) Output-independent assumption First-order Markov assumption

18 Solution to Problem 1 - The Forward Procedure (cont.)  3 (2)=P(o 1,o 2,o 3,q 3 =2| ) =[  2 (1)*a 12 +  2 (2)*a 22 +  2 (3)*a 32 ]b 2 (o 3 ) S2S2 S3S3 S1S1 o1o1 S2S2 S3S3 S1S1 S3S3 S2S2 S1S1 S2S2 S3S3 S1S1 State o2o2 o3o3 oToT 1 2 3 T-1 T Time S2S2 S3S3 S1S1 o T-1 SiSi means b j (o t ) has been computed a ij means a ij has been computed  2 (1)  2 (2)  2 (3) a 12 a 22 a 32 b 2 (o 3 )

19 Solution to Problem 1 - The Forward Procedure (cont.) Algorithm –Complexity: O(N 2 T) Based on the lattice (trellis) structure –Computed in a time-synchronous fashion from left-to-right, where each cell for time t is completely computed before proceeding to time t+1 All state sequences, regardless how long previously, merge to N nodes (states) at each time instance t

20 Solution to Problem 1 - The Forward Procedure (cont.) A three-state Hidden Markov Model for the Dow Jones Industrial average b 1 (up)=0.7 b 2 (up)= 0.1 b 3 (up)=0.3 a 11 =0.6 a 21 =0.5 a 31 =0.4 (Huang et al., 2001) b 1 (up)=0.7 b 2 (up)= 0.1 b 3 (up)=0.3 π 1 =0.5 π 2 =0.2 π 3 =0.3 α 1 (1)=0.5*0.7 α 1 (2)= 0.2*0.1 α 1 (3)= 0.3*0.3 α 2 (1)= (0.35*0.6+0.02*0.5+0.09*0.4)*0.7

21 Solution to Problem 2 - The Viterbi Algorithm The Viterbi algorithm can be regarded as the dynamic programming algorithm applied to the HMM or as a modified forward algorithm –Instead of summing up probabilities from different paths coming to the same destination state, the Viterbi algorithm picks and remembers the best path Find a single optimal state sequence Q =( q 1,q 2,……, q T ) –The Viterbi algorithm also can be illustrated in a trellis framework similar to the one for the forward algorithm

22 Solution to Problem 2 - The Viterbi Algorithm (cont.) S2S2 S3S3 S1S1 o1o1 S2S2 S3S3 S1S1 S2S2 S3S3 S1S1 S2S2 S1S1 S3S3 State o2o2 o3o3 oToT 1 2 3 T-1 T Time S2S2 S3S3 S1S1 o T-1

23 Solution to Problem 2 - The Viterbi Algorithm (cont.) 1.Initialization 2.Induction 3.Termination 4.Backtracking Complexity: O(N 2 T) is the best state sequence

24 b 1 (up)=0.7 b 2 (up)= 0.1 b 3 (up)=0.3 a 11 =0.6 a 21 =0.5 a 31 =0.4 b 1 (up)=0.7 b 2 (up)= 0.1 b 3 (up)=0.3 π 1 =0.5 π 2 =0.2 π 3 =0.3 Solution to Problem 2 - The Viterbi Algorithm (cont.) A three-state Hidden Markov Model for the Dow Jones Industrial average (Huang et al., 2001) δ 1 (1)=0.5*0.7 δ 1 (2)= 0.2*0.1 δ 1 (3)= 0.3*0.3 δ 2 (1) =max (0.35*0.6, 0.02*0.5, 0.09*0.4)*0.7 δ 2 (1)= 0.35*0.6*0.7=0.147 Ψ 2 (1)=1

25 Solution to Problem 3 – The Baum-Welch Algorithm How to adjust (re-estimate) the model parameters =(A,B,  ) to maximize P(O| ) ? –The most difficult one among the three problems, because there is no known analytical method that maximizes the joint probability of the training data in a closed form The data is incomplete because of the hidden state sequence –The problem can be solved by the iterative Baum-Welch algorithm, also known as the forward-backward algorithm The EM (Expectation Maximization) algorithm is perfectly suitable for this problem

26 Solution to Problem 3 – The Backward Procedure Backward variable : –The probability of the partial observation sequence o t+1,o t+2,…,o T, given state i at time t and the model –  2 (3)=P(o 3,o 4,…, o T |q 2 =3, ) =a 31 * b 1 (o 3 )*  3 (1)+a 32 * b 2 (o 3 )*  3 (2)+a 33 * b 3 (o 3 )*  3 (3) S2S2 S3S3 S1S1 o1o1 S2S2 S3S3 S1S1 S2S2 S3S3 S1S1 S2S2 S3S3 S1S1 o2o2 o3o3 oToT 1 2 3 T-1 T Time S2S2 S3S3 S3S3 o T-1 S2S2 S3S3 S1S1 State  3 (1) b 1 (o 3 ) a 31

27 Solution to Problem 3 – The Backward Procedure (cont.) Algorithm cf.

28 Solution to Problem 3 – The Forward-Backward Algorithm Relation between the forward and backward variables (Huang et al., 2001)

29 Solution to Problem 3 – The Forward-Backward Algorithm (cont.)

30 Solution to Problem 3 – The Intuitive View Define two new variables:  t (i)= P(q t = i | O, ) –Probability of being in state i at time t, given O and  t ( i, j )=P(q t = i, q t+1 = j | O, ) –Probability of being in state i at time t and state j at time t+1, given O and

31 Solution to Problem 3 – The Intuitive View (cont.) P(q 3 = 3, O | )=  3 (3)*  3 (3) o1o1 s2s2 s1s1 s3s3 s2s2 s1s1 s3s3 S2S2 s1s1 S1S1 State o2o2 o3o3 oToT 1 2 3 4 T-1 T Time o T-1 S2S2 S3S3 S1S1 S2S2 S3S3 S1S1 S3S3 S2S2 S3S3 S1S1 S2S2 S3S3 S1S1 S2S2 S3S3 S1S1  3 (3)  3 (3)

32 Solution to Problem 3 – The Intuitive View (cont.) P(q 3 = 3, q 4 = 1, O | )=  3 (3)*a 31 *b 1 (o 4 )*  4 (1) o1o1 s2s2 s1s1 s3s3 s2s2 s1s1 s3s3 S2S2 s1s1 S1S1 State o2o2 o3o3 oToT 1 2 3 4 T-1 T Time o T-1 S2S2 S3S3 S1S1 S2S2 S3S3 S1S1 S3S3 S2S2 S3S3 S1S1 S2S2 S3S3 S1S1 S2S2 S3S3 S1S1  3 (3)  4 (1) a 31 b 1 (o 4 )

33 Solution to Problem 3 – The Intuitive View (cont.)  t ( i, j )=P(q t = i, q t+1 = j | O, )  t (i)= P(q t = i | O, )

34 Solution to Problem 3 – The Intuitive View (cont.) Re-estimation formulae for , A, and B are

1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.

Similar presentations

Presentation on theme: "1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.

Similar presentations

Presentation on theme: "1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter."— Presentation transcript:

Similar presentations

About project

Feedback