Download presentation
Presentation is loading. Please wait.
1
Presented by: Van-Quyet Nguyen
Topics in Pattern Recognition A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition Lawrence R. Rabiner Proceedings of the IEEE 77.2 (1989) citations Presented by: Van-Quyet Nguyen Topics in Pattern Recognition
2
Topics in Pattern Recognition
Outline Introduction Discrete Markov Process Hidden Markov Models (HMMs) Three basic problems for HMMs Evaluation: Forward/Backward algorithm Optimal: Forward-Backward, Viterbi algorithm Estimate Parameters: Baum-Welch algorithm Summary Topics in Pattern Recognition
3
Topics in Pattern Recognition
Introduction Markov models/Hidden Markov Models are introduced in the late 1960. The models are very rich in mathematical structure Applied in a wide range of applications Applications: Speech recognition: spoken sound observed (heard), depends on the words Stock market: stock daily up/down observed, depends on big market trend Bioinformatics: gene prediction, protein structure prediction… Topics in Pattern Recognition
4
Discrete Markov Process
N states: S1, S2, ..., SN In each time instant t = 1, 2, ..., T a system changes (makes a transition) to state qt For a special case of a first order Markov chain: P(qt = Sj|qt-1 = Si, qt-2 = Sk, ...) = P(qt = Sj |qt-1 = Si) State transition probabilities aij = P(qt = Sj |qt-1 = Si) 1 ≤ i, j ≤ N 𝒂 𝒊𝒋 ≥𝟎 𝒋=𝟏 𝑵 𝒂 𝒊𝒋 =𝟏 The system jumping from all the predecessor states (t-1) to the state Sj at the time t A Markov chain with 5 states and selected transitions Topics in Pattern Recognition
5
Topics in Pattern Recognition
Example States: 1 – rain; 2 – cloudy; 3 – sunny Discrete times: day 1, 2, 3, … State transition probability Given state 3 at day 1 (t=1) Question: What is the probability that next 7 days will be “sun-sun-rain-rain-sun-cloudy-sun” Solution: Obsevation sequence O = {S3, S3, S3, S1, S1, S3, S2, S3} P(O|Model) = P(S3, S3, S3, S1, S1, S3, S2, S3 | Model) = P(S3) x P(S3|S3) x P(S3|S3) x P(S1|S3) x P(S1|S1) x P(S3|S1) x P(S2|S3) x P(S3|S2) = 1.0 x a33 x a33 x a31 x a11 x a13 x a32 x a23 = 1.0 x 0.8 x 0.8 x 0.1 x 0.4 x 0.3 x 0.1 x = x 10-4 We assume that once a day (e.g., at noon), the weather is observed as being one of the following: State 1: rain State 2: cloudy State 3: sunny We postulate that the weather on day t is characterized by a single one of the three states above, and that the matrix A of state transition probabilities is Topics in Pattern Recognition
6
Hidden Markov Models (HMMs)
Coin toss example Coin transition is a Markov chain Probability of H/T depends on the coin used Observation of H/T is a hidden Markov chain (coin state is hidden) The Coin Toss Example – 2 coins Topics in Pattern Recognition
7
Hidden Markov Models (HMMs)
Elements of an HMM N is the number of the hidden states in the model M is the number of distinct observation symbols for each state A = {aij} is the state transition probability distribution B = {bj(k)} is the observation symbol probability distribution in state j bj(k) = P(vk at t|qt = Sj), ≤ j ≤ N and 1 ≤ k ≤ M 𝜋 𝑖 is the initial state distribution 𝜋 𝑖 = P (q1 = Si), ≤ i ≤ N λ represents the complete parameter set of a model, where λ = (A, B, π). O is the observation sequence is denoted as O = O1O2 …OT Topics in Pattern Recognition
8
Topics in Pattern Recognition
HMM Example – Coin Toss 0.1 0.9 0.3 Fair Bias State transition prob. States 0.7 0.5 0.5 0.2 0.8 Observation symbol prob. H T H T Observation symbols Motivation: Given a sequence of H & Ts, can you tell at what times the casino cheated? HTHHTTHHHTHTHTHHTHHHHHHTHTHH Observation sequence FFFFFFBBBFFFFFFBBBBBBBBFFFFFF State sequence Topics in Pattern Recognition
9
Topics in Pattern Recognition
HMM Example – Coin Toss Elements of an HMM N is the number of the hidden states: 2 (F/B) M is the number of observation symbols: 2 (H/T) A = {aij} is the state transition probability distribution B = {bj(k)} is the observation symbol probability distribution in state j 𝜋 𝑖 is the initial state distribution F = 0.4, B = 0.6 Topics in Pattern Recognition
10
Three basic problems for HMMs
Problem 1: Given the observation sequence O = O1O2 ...OT and a model λ = (A, B,π), how do we efficiently compute P(O|λ), the probability of the observation sequence given the model? Problem 2: Given the observation sequence O = O1O2 ...OT and a model λ, how do we choose a corresponding state sequence Q = q1 q2 ...qT which is optimal in some sense, i.e., best explains the observations? Problem 3: Given the observation sequence O = O1O2 ...OT, how do we adjust the model parameters λ = (A, B, π) to maximize P(O|λ)? Topics in Pattern Recognition
11
Topics in Pattern Recognition
Problem 1: Evaluation Input: O = O1O2 ...OT and λ = (A, B, π) Output: P(O|λ)? Solution: For a sample sequence Q = q1 q2 …qT The probability of such a state sequence Q Therefore the joint probability Topics in Pattern Recognition
12
Forward/Backward Algorithm
Problem 1: Evaluation Solution (cont.): By considering all possible state sequences Problem: 2TNT calculations NT possible state sequences about 2T calculations for each sequence Forward/Backward Algorithm Topics in Pattern Recognition
13
Topics in Pattern Recognition
Forward Algorithm We define a forward variable αt(i) as the probability of the partial observation seq. until time t, with state Si at time t Step 1 - Initialization: Step 2 - Induction: Step 3 – Termination: TN^2 Topics in Pattern Recognition
14
Forward Algorithm (cont.)
Figure (a): Operations for computing the forward variable αt (j + 1) Figure (b): Computing αt (j) in terms of a lattice Topics in Pattern Recognition
15
Forward Algorithm - Example
Coin toss, O = HTTH Initialization: Prob. of seeing H1 from F1 or B1 F B H T T H Topics in Pattern Recognition
16
Forward Algorithm - Example
Coin toss, O = HTTH Initialization: Induction: Prob. of seeing T2 from F2 or B2 F F + H T T H + B B Topics in Pattern Recognition
17
Forward Algorithm - Example
Coin toss, O = HTTH Initialization: Induction: Prob. of seeing T3 from F3 or B3 F F B F B + + H T T H B Topics in Pattern Recognition
18
Forward Algorithm - Example
Coin toss, O = HTTH Initialization: Induction: Prob. of seeing H4 from F4 or B4 F F B F B F B + + + H T T H B Topics in Pattern Recognition
19
Forward Algorithm - Example
Coin toss, O = HTTH Initialization: Induction: Termination: Topics in Pattern Recognition
20
Topics in Pattern Recognition
Backward Algorithm We define a backward variable βt(i) as the probability of the partial observation seq. after time t, given state Si at time t Step 1 - Initialization: Step 2 - Induction: Step 3 – Termination: Topics in Pattern Recognition
21
Topics in Pattern Recognition
Problem 2: Optimal Input: O = O1O2 ...OT and λ = (A, B, π) Output: Q = q1 q2...qT which is optimal in some sense? Maximizes the expected number of correct states. Find the single best state sequence Solution: (Forward-Backwark Algorithm) Define the variable In terms of forward and backward variables What happens if aij = 0 Topics in Pattern Recognition
22
Topics in Pattern Recognition
Problem 2: Optimal Viterbi Algorithm Finding the best single sequence means computing argmaxQ P(Q|O, λ), equivalent to argmaxQ P(Q, O|λ) Initialization: Recursion: Termination: Path backtracking: Δ – is the best score (highest probability) along a single path Psi – array keep track of the argument which maximized from delta Topics in Pattern Recognition
23
Viterbi Algorithm - Example
Observe: HTTH Initialization: 0.2 F B H T T H 0.48 Topics in Pattern Recognition
24
Viterbi Algorithm - Example
Observe: HTTH Initialization: Recursion: 0.09 0.2 F F H T T H 0.48 B B 0.0672 Topics in Pattern Recognition
25
Viterbi Algorithm - Example
Observe: HTTH Initiation: Recursion: 0.09 0.0405 0.2 F F F H T T H 0.0672 0.0094 0.48 B B B Topics in Pattern Recognition
26
Viterbi Algorithm - Example
Observe: HTTH Initiation: Recursion: 0.09 0.0405 0.2 F F F F H T T H 0.0672 0.0094 0.0324 0.48 B B B B Topics in Pattern Recognition
27
Viterbi Algorithm - Example
Observe: HTTH Initiation: Recursion: Termination: pick state that gives final best δ score, and backtrack to get path FFFB most likely to give HTTH 0.09 0.0405 0.2 F F F F H T T H 0.0672 0.0094 0.0324 B B B 0.48 B Topics in Pattern Recognition
28
Problem 3: Estimate Parameters
How do we adjust the model parameters λ = (A, B, π) to maximize P(O|λ)? Solution: Baum-Welch algorithm Random initialize =(A,B,) Run Viterbi based on and O Update =(A,B,) There is no known way to analytically solve for the model which maximizes the probability of the observation sequence Topics in Pattern Recognition
29
Problem 3: Estimate Parameters
Solution: Baum-Welch algorithm We need to define ξt (i,j), i.e., the probability of being in state Si at time t and in state Sj at time t + 1 Recall that γt(i) is a probability of state Si at time t, hence There is no known way to analytically solve for the model which maximizes the probability of the observation sequence Topics in Pattern Recognition
30
Problem 3: Estimate Parameters
Solution: Baum-Welch algorithm (cont.) If we sum over the time index t = expected number of transitions from state Si = expected number of transitions from Si to Sj Reestimation formulas There is no known way to analytically solve for the model which maximizes the probability of the observation sequence Topics in Pattern Recognition
31
Topics in Pattern Recognition
Summary Markov Chain Hidden Markov Model Observations, hidden states, initial, transition and emission probabilities Three problems Evaluation: forward, backward procedure Optimal: forward-backward (probability prediction at each state), Viterbi (best path) Estimate parameters: Baum-Welch Topics in Pattern Recognition
32
Thanks for your listening!
Topics in Pattern Recognition Thanks for your listening! Topics in Pattern Recognition
33
Topics in Pattern Recognition
References [1] Rabiner, Lawrence R. "A tutorial on hidden Markov models and selected applications in speech recognition." Proceedings of the IEEE 77.2 (1989): [2] Dugad, Rakesh, and UDAY B. Desai. "A tutorial on hidden Markov models."Signal Processing and Artificial Neural Networks Laboratory Department of Electrical Engineering Indian Institute of Technology (1996). [3] Forney Jr, G. David. "The viterbi algorithm." Proceedings of the IEEE 61.3 (1973): Topics in Pattern Recognition
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.