Presented by: Van-Quyet Nguyen

Presented by: Van-Quyet Nguyen
Topics in Pattern Recognition A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition Lawrence R. Rabiner Proceedings of the IEEE 77.2 (1989) citations Presented by: Van-Quyet Nguyen Topics in Pattern Recognition

Topics in Pattern Recognition
Outline Introduction Discrete Markov Process Hidden Markov Models (HMMs) Three basic problems for HMMs Evaluation: Forward/Backward algorithm Optimal: Forward-Backward, Viterbi algorithm Estimate Parameters: Baum-Welch algorithm Summary Topics in Pattern Recognition

Introduction Markov models/Hidden Markov Models are introduced in the late 1960. The models are very rich in mathematical structure Applied in a wide range of applications Applications: Speech recognition: spoken sound observed (heard), depends on the words Stock market: stock daily up/down observed, depends on big market trend Bioinformatics: gene prediction, protein structure prediction… Topics in Pattern Recognition

Discrete Markov Process
N states: S1, S2, ..., SN In each time instant t = 1, 2, ..., T a system changes (makes a transition) to state qt For a special case of a first order Markov chain: P(qt = Sj|qt-1 = Si, qt-2 = Sk, ...) = P(qt = Sj |qt-1 = Si) State transition probabilities aij = P(qt = Sj |qt-1 = Si) 1 ≤ i, j ≤ N 𝒂 𝒊𝒋 ≥𝟎 𝒋=𝟏 𝑵 𝒂 𝒊𝒋 =𝟏 The system jumping from all the predecessor states (t-1) to the state Sj at the time t A Markov chain with 5 states and selected transitions Topics in Pattern Recognition

Example States: 1 – rain; 2 – cloudy; 3 – sunny Discrete times: day 1, 2, 3, … State transition probability Given state 3 at day 1 (t=1) Question: What is the probability that next 7 days will be “sun-sun-rain-rain-sun-cloudy-sun” Solution: Obsevation sequence O = {S3, S3, S3, S1, S1, S3, S2, S3} P(O|Model) = P(S3, S3, S3, S1, S1, S3, S2, S3 | Model) = P(S3) x P(S3|S3) x P(S3|S3) x P(S1|S3) x P(S1|S1) x P(S3|S1) x P(S2|S3) x P(S3|S2) = 1.0 x a33 x a33 x a31 x a11 x a13 x a32 x a23 = 1.0 x 0.8 x 0.8 x 0.1 x 0.4 x 0.3 x 0.1 x = x 10-4 We assume that once a day (e.g., at noon), the weather is observed as being one of the following: State 1: rain State 2: cloudy State 3: sunny We postulate that the weather on day t is characterized by a single one of the three states above, and that the matrix A of state transition probabilities is Topics in Pattern Recognition

Hidden Markov Models (HMMs)
Coin toss example Coin transition is a Markov chain Probability of H/T depends on the coin used Observation of H/T is a hidden Markov chain (coin state is hidden) The Coin Toss Example – 2 coins Topics in Pattern Recognition

Hidden Markov Models (HMMs)
Elements of an HMM N is the number of the hidden states in the model M is the number of distinct observation symbols for each state A = {aij} is the state transition probability distribution B = {bj(k)} is the observation symbol probability distribution in state j bj(k) = P(vk at t|qt = Sj), ≤ j ≤ N and 1 ≤ k ≤ M 𝜋 𝑖 is the initial state distribution 𝜋 𝑖 = P (q1 = Si), ≤ i ≤ N λ represents the complete parameter set of a model, where λ = (A, B, π). O is the observation sequence is denoted as O = O1O2 …OT Topics in Pattern Recognition

HMM Example – Coin Toss 0.1 0.9 0.3 Fair Bias State transition prob. States 0.7 0.5 0.5 0.2 0.8 Observation symbol prob. H T H T Observation symbols Motivation: Given a sequence of H & Ts, can you tell at what times the casino cheated? HTHHTTHHHTHTHTHHTHHHHHHTHTHH Observation sequence FFFFFFBBBFFFFFFBBBBBBBBFFFFFF State sequence Topics in Pattern Recognition

HMM Example – Coin Toss Elements of an HMM N is the number of the hidden states: 2 (F/B) M is the number of observation symbols: 2 (H/T) A = {aij} is the state transition probability distribution B = {bj(k)} is the observation symbol probability distribution in state j 𝜋 𝑖 is the initial state distribution F = 0.4, B = 0.6 Topics in Pattern Recognition

Three basic problems for HMMs
Problem 1: Given the observation sequence O = O1O2 ...OT and a model λ = (A, B,π), how do we efficiently compute P(O|λ), the probability of the observation sequence given the model? Problem 2: Given the observation sequence O = O1O2 ...OT and a model λ, how do we choose a corresponding state sequence Q = q1 q2 ...qT which is optimal in some sense, i.e., best explains the observations? Problem 3: Given the observation sequence O = O1O2 ...OT, how do we adjust the model parameters λ = (A, B, π) to maximize P(O|λ)? Topics in Pattern Recognition

Problem 1: Evaluation Input: O = O1O2 ...OT and λ = (A, B, π) Output: P(O|λ)? Solution: For a sample sequence Q = q1 q2 …qT The probability of such a state sequence Q Therefore the joint probability Topics in Pattern Recognition

Forward/Backward Algorithm
Problem 1: Evaluation Solution (cont.): By considering all possible state sequences Problem: 2TNT calculations NT possible state sequences about 2T calculations for each sequence Forward/Backward Algorithm Topics in Pattern Recognition

Forward Algorithm We define a forward variable αt(i) as the probability of the partial observation seq. until time t, with state Si at time t Step 1 - Initialization: Step 2 - Induction: Step 3 – Termination: TN^2 Topics in Pattern Recognition

Forward Algorithm (cont.)
Figure (a): Operations for computing the forward variable αt (j + 1) Figure (b): Computing αt (j) in terms of a lattice Topics in Pattern Recognition

Forward Algorithm - Example
Coin toss, O = HTTH Initialization: Prob. of seeing H1 from F1 or B1 F B H T T H Topics in Pattern Recognition

Coin toss, O = HTTH Initialization: Induction: Prob. of seeing T2 from F2 or B2 F F + H T T H + B B Topics in Pattern Recognition

Coin toss, O = HTTH Initialization: Induction: Prob. of seeing T3 from F3 or B3 F F B F B + + H T T H B Topics in Pattern Recognition

Coin toss, O = HTTH Initialization: Induction: Prob. of seeing H4 from F4 or B4 F F B F B F B + + + H T T H B Topics in Pattern Recognition

Coin toss, O = HTTH Initialization: Induction: Termination: Topics in Pattern Recognition

Backward Algorithm We define a backward variable βt(i) as the probability of the partial observation seq. after time t, given state Si at time t Step 1 - Initialization: Step 2 - Induction: Step 3 – Termination: Topics in Pattern Recognition

Problem 2: Optimal Input: O = O1O2 ...OT and λ = (A, B, π) Output: Q = q1 q2...qT which is optimal in some sense? Maximizes the expected number of correct states. Find the single best state sequence Solution: (Forward-Backwark Algorithm) Define the variable In terms of forward and backward variables What happens if aij = 0 Topics in Pattern Recognition

Problem 2: Optimal Viterbi Algorithm Finding the best single sequence means computing argmaxQ P(Q|O, λ), equivalent to argmaxQ P(Q, O|λ) Initialization: Recursion: Termination: Path backtracking: Δ – is the best score (highest probability) along a single path Psi – array keep track of the argument which maximized from delta Topics in Pattern Recognition

Viterbi Algorithm - Example
Observe: HTTH Initialization: 0.2 F B H T T H 0.48 Topics in Pattern Recognition

Observe: HTTH Initialization: Recursion: 0.09 0.2 F F H T T H 0.48 B B 0.0672 Topics in Pattern Recognition

Observe: HTTH Initiation: Recursion: 0.09 0.0405 0.2 F F F H T T H 0.0672 0.0094 0.48 B B B Topics in Pattern Recognition

Observe: HTTH Initiation: Recursion: 0.09 0.0405 0.2 F F F F H T T H 0.0672 0.0094 0.0324 0.48 B B B B Topics in Pattern Recognition

Observe: HTTH Initiation: Recursion: Termination: pick state that gives final best δ score, and backtrack to get path  FFFB most likely to give HTTH 0.09 0.0405 0.2 F F F F H T T H 0.0672 0.0094 0.0324 B B B 0.48 B Topics in Pattern Recognition

Problem 3: Estimate Parameters
How do we adjust the model parameters λ = (A, B, π) to maximize P(O|λ)? Solution: Baum-Welch algorithm Random initialize  =(A,B,) Run Viterbi based on  and O Update  =(A,B,) There is no known way to analytically solve for the model which maximizes the probability of the observation sequence Topics in Pattern Recognition

Solution: Baum-Welch algorithm We need to define ξt (i,j), i.e., the probability of being in state Si at time t and in state Sj at time t + 1 Recall that γt(i) is a probability of state Si at time t, hence There is no known way to analytically solve for the model which maximizes the probability of the observation sequence Topics in Pattern Recognition

Solution: Baum-Welch algorithm (cont.) If we sum over the time index t = expected number of transitions from state Si = expected number of transitions from Si to Sj Reestimation formulas There is no known way to analytically solve for the model which maximizes the probability of the observation sequence Topics in Pattern Recognition

Summary Markov Chain Hidden Markov Model Observations, hidden states, initial, transition and emission probabilities Three problems Evaluation: forward, backward procedure Optimal: forward-backward (probability prediction at each state), Viterbi (best path) Estimate parameters: Baum-Welch Topics in Pattern Recognition

Thanks for your listening!
Topics in Pattern Recognition Thanks for your listening! Topics in Pattern Recognition

References [1] Rabiner, Lawrence R. "A tutorial on hidden Markov models and selected applications in speech recognition." Proceedings of the IEEE 77.2 (1989): [2] Dugad, Rakesh, and UDAY B. Desai. "A tutorial on hidden Markov models."Signal Processing and Artificial Neural Networks Laboratory Department of Electrical Engineering Indian Institute of Technology (1996). [3] Forney Jr, G. David. "The viterbi algorithm." Proceedings of the IEEE 61.3 (1973): Topics in Pattern Recognition

Presented by: Van-Quyet Nguyen

Similar presentations

Presentation on theme: "Presented by: Van-Quyet Nguyen"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Presented by: Van-Quyet Nguyen

Similar presentations

Presentation on theme: "Presented by: Van-Quyet Nguyen"— Presentation transcript:

Similar presentations

About project

Feedback