Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by: Van-Quyet Nguyen

Similar presentations


Presentation on theme: "Presented by: Van-Quyet Nguyen"— Presentation transcript:

1 Presented by: Van-Quyet Nguyen
Topics in Pattern Recognition A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition Lawrence R. Rabiner Proceedings of the IEEE 77.2 (1989) citations Presented by: Van-Quyet Nguyen Topics in Pattern Recognition

2 Topics in Pattern Recognition
Outline Introduction Discrete Markov Process Hidden Markov Models (HMMs) Three basic problems for HMMs Evaluation: Forward/Backward algorithm Optimal: Forward-Backward, Viterbi algorithm Estimate Parameters: Baum-Welch algorithm Summary Topics in Pattern Recognition

3 Topics in Pattern Recognition
Introduction Markov models/Hidden Markov Models are introduced in the late 1960. The models are very rich in mathematical structure Applied in a wide range of applications Applications: Speech recognition: spoken sound observed (heard), depends on the words Stock market: stock daily up/down observed, depends on big market trend Bioinformatics: gene prediction, protein structure prediction… Topics in Pattern Recognition

4 Discrete Markov Process
N states: S1, S2, ..., SN In each time instant t = 1, 2, ..., T a system changes (makes a transition) to state qt For a special case of a first order Markov chain: P(qt = Sj|qt-1 = Si, qt-2 = Sk, ...) = P(qt = Sj |qt-1 = Si) State transition probabilities aij = P(qt = Sj |qt-1 = Si) 1 ≤ i, j ≤ N 𝒂 𝒊𝒋 ≥𝟎 𝒋=𝟏 𝑵 𝒂 𝒊𝒋 =𝟏 The system jumping from all the predecessor states (t-1) to the state Sj at the time t A Markov chain with 5 states and selected transitions Topics in Pattern Recognition

5 Topics in Pattern Recognition
Example States: 1 – rain; 2 – cloudy; 3 – sunny Discrete times: day 1, 2, 3, … State transition probability Given state 3 at day 1 (t=1) Question: What is the probability that next 7 days will be “sun-sun-rain-rain-sun-cloudy-sun” Solution: Obsevation sequence O = {S3, S3, S3, S1, S1, S3, S2, S3} P(O|Model) = P(S3, S3, S3, S1, S1, S3, S2, S3 | Model) = P(S3) x P(S3|S3) x P(S3|S3) x P(S1|S3) x P(S1|S1) x P(S3|S1) x P(S2|S3) x P(S3|S2) = 1.0 x a33 x a33 x a31 x a11 x a13 x a32 x a23 = 1.0 x 0.8 x 0.8 x 0.1 x 0.4 x 0.3 x 0.1 x = x 10-4 We assume that once a day (e.g., at noon), the weather is observed as being one of the following: State 1: rain State 2: cloudy State 3: sunny We postulate that the weather on day t is characterized by a single one of the three states above, and that the matrix A of state transition probabilities is Topics in Pattern Recognition

6 Hidden Markov Models (HMMs)
Coin toss example Coin transition is a Markov chain Probability of H/T depends on the coin used Observation of H/T is a hidden Markov chain (coin state is hidden) The Coin Toss Example – 2 coins Topics in Pattern Recognition

7 Hidden Markov Models (HMMs)
Elements of an HMM N is the number of the hidden states in the model M is the number of distinct observation symbols for each state A = {aij} is the state transition probability distribution B = {bj(k)} is the observation symbol probability distribution in state j bj(k) = P(vk at t|qt = Sj), ≤ j ≤ N and 1 ≤ k ≤ M 𝜋 𝑖 is the initial state distribution 𝜋 𝑖 = P (q1 = Si), ≤ i ≤ N λ represents the complete parameter set of a model, where λ = (A, B, π). O is the observation sequence is denoted as O = O1O2 …OT Topics in Pattern Recognition

8 Topics in Pattern Recognition
HMM Example – Coin Toss 0.1 0.9 0.3 Fair Bias State transition prob. States 0.7 0.5 0.5 0.2 0.8 Observation symbol prob. H T H T Observation symbols Motivation: Given a sequence of H & Ts, can you tell at what times the casino cheated? HTHHTTHHHTHTHTHHTHHHHHHTHTHH Observation sequence FFFFFFBBBFFFFFFBBBBBBBBFFFFFF State sequence Topics in Pattern Recognition

9 Topics in Pattern Recognition
HMM Example – Coin Toss Elements of an HMM N is the number of the hidden states: 2 (F/B) M is the number of observation symbols: 2 (H/T) A = {aij} is the state transition probability distribution B = {bj(k)} is the observation symbol probability distribution in state j 𝜋 𝑖 is the initial state distribution F = 0.4, B = 0.6 Topics in Pattern Recognition

10 Three basic problems for HMMs
Problem 1: Given the observation sequence O = O1O2 ...OT and a model λ = (A, B,π), how do we efficiently compute P(O|λ), the probability of the observation sequence given the model? Problem 2: Given the observation sequence O = O1O2 ...OT and a model λ, how do we choose a corresponding state sequence Q = q1 q2 ...qT which is optimal in some sense, i.e., best explains the observations? Problem 3: Given the observation sequence O = O1O2 ...OT, how do we adjust the model parameters λ = (A, B, π) to maximize P(O|λ)? Topics in Pattern Recognition

11 Topics in Pattern Recognition
Problem 1: Evaluation Input: O = O1O2 ...OT and λ = (A, B, π) Output: P(O|λ)? Solution: For a sample sequence Q = q1 q2 …qT The probability of such a state sequence Q Therefore the joint probability Topics in Pattern Recognition

12 Forward/Backward Algorithm
Problem 1: Evaluation Solution (cont.): By considering all possible state sequences Problem: 2TNT calculations NT possible state sequences about 2T calculations for each sequence Forward/Backward Algorithm Topics in Pattern Recognition

13 Topics in Pattern Recognition
Forward Algorithm We define a forward variable αt(i) as the probability of the partial observation seq. until time t, with state Si at time t Step 1 - Initialization: Step 2 - Induction: Step 3 – Termination: TN^2 Topics in Pattern Recognition

14 Forward Algorithm (cont.)
Figure (a): Operations for computing the forward variable αt (j + 1) Figure (b): Computing αt (j) in terms of a lattice Topics in Pattern Recognition

15 Forward Algorithm - Example
Coin toss, O = HTTH Initialization: Prob. of seeing H1 from F1 or B1 F B H T T H Topics in Pattern Recognition

16 Forward Algorithm - Example
Coin toss, O = HTTH Initialization: Induction: Prob. of seeing T2 from F2 or B2 F F + H T T H + B B Topics in Pattern Recognition

17 Forward Algorithm - Example
Coin toss, O = HTTH Initialization: Induction: Prob. of seeing T3 from F3 or B3 F F B F B + + H T T H B Topics in Pattern Recognition

18 Forward Algorithm - Example
Coin toss, O = HTTH Initialization: Induction: Prob. of seeing H4 from F4 or B4 F F B F B F B + + + H T T H B Topics in Pattern Recognition

19 Forward Algorithm - Example
Coin toss, O = HTTH Initialization: Induction: Termination: Topics in Pattern Recognition

20 Topics in Pattern Recognition
Backward Algorithm We define a backward variable βt(i) as the probability of the partial observation seq. after time t, given state Si at time t Step 1 - Initialization: Step 2 - Induction: Step 3 – Termination: Topics in Pattern Recognition

21 Topics in Pattern Recognition
Problem 2: Optimal Input: O = O1O2 ...OT and λ = (A, B, π) Output: Q = q1 q2...qT which is optimal in some sense? Maximizes the expected number of correct states. Find the single best state sequence Solution: (Forward-Backwark Algorithm) Define the variable In terms of forward and backward variables What happens if aij = 0 Topics in Pattern Recognition

22 Topics in Pattern Recognition
Problem 2: Optimal Viterbi Algorithm Finding the best single sequence means computing argmaxQ P(Q|O, λ), equivalent to argmaxQ P(Q, O|λ) Initialization: Recursion: Termination: Path backtracking: Δ – is the best score (highest probability) along a single path Psi – array keep track of the argument which maximized from delta Topics in Pattern Recognition

23 Viterbi Algorithm - Example
Observe: HTTH Initialization: 0.2 F B H T T H 0.48 Topics in Pattern Recognition

24 Viterbi Algorithm - Example
Observe: HTTH Initialization: Recursion: 0.09 0.2 F F H T T H 0.48 B B 0.0672 Topics in Pattern Recognition

25 Viterbi Algorithm - Example
Observe: HTTH Initiation: Recursion: 0.09 0.0405 0.2 F F F H T T H 0.0672 0.0094 0.48 B B B Topics in Pattern Recognition

26 Viterbi Algorithm - Example
Observe: HTTH Initiation: Recursion: 0.09 0.0405 0.2 F F F F H T T H 0.0672 0.0094 0.0324 0.48 B B B B Topics in Pattern Recognition

27 Viterbi Algorithm - Example
Observe: HTTH Initiation: Recursion: Termination: pick state that gives final best δ score, and backtrack to get path  FFFB most likely to give HTTH 0.09 0.0405 0.2 F F F F H T T H 0.0672 0.0094 0.0324 B B B 0.48 B Topics in Pattern Recognition

28 Problem 3: Estimate Parameters
How do we adjust the model parameters λ = (A, B, π) to maximize P(O|λ)? Solution: Baum-Welch algorithm Random initialize  =(A,B,) Run Viterbi based on  and O Update  =(A,B,) There is no known way to analytically solve for the model which maximizes the probability of the observation sequence Topics in Pattern Recognition

29 Problem 3: Estimate Parameters
Solution: Baum-Welch algorithm We need to define ξt (i,j), i.e., the probability of being in state Si at time t and in state Sj at time t + 1 Recall that γt(i) is a probability of state Si at time t, hence There is no known way to analytically solve for the model which maximizes the probability of the observation sequence Topics in Pattern Recognition

30 Problem 3: Estimate Parameters
Solution: Baum-Welch algorithm (cont.) If we sum over the time index t = expected number of transitions from state Si = expected number of transitions from Si to Sj Reestimation formulas There is no known way to analytically solve for the model which maximizes the probability of the observation sequence Topics in Pattern Recognition

31 Topics in Pattern Recognition
Summary Markov Chain Hidden Markov Model Observations, hidden states, initial, transition and emission probabilities Three problems Evaluation: forward, backward procedure Optimal: forward-backward (probability prediction at each state), Viterbi (best path) Estimate parameters: Baum-Welch Topics in Pattern Recognition

32 Thanks for your listening!
Topics in Pattern Recognition Thanks for your listening! Topics in Pattern Recognition

33 Topics in Pattern Recognition
References [1] Rabiner, Lawrence R. "A tutorial on hidden Markov models and selected applications in speech recognition." Proceedings of the IEEE 77.2 (1989): [2] Dugad, Rakesh, and UDAY B. Desai. "A tutorial on hidden Markov models."Signal Processing and Artificial Neural Networks Laboratory Department of Electrical Engineering Indian Institute of Technology (1996). [3] Forney Jr, G. David. "The viterbi algorithm." Proceedings of the IEEE 61.3 (1973): Topics in Pattern Recognition


Download ppt "Presented by: Van-Quyet Nguyen"

Similar presentations


Ads by Google