Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.

Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.

Marquette University Overview  Intro: The problem with sequential data  Markov chains  Hidden Markov Models  Key HMM algorithms  Evaluation  Alignment  Training / parameter estimation  Examples / applications

Big Picture View of Statistical Models Basic Gaussian HMMs

Marquette University Nonstationary sequential data

Marquette University Historical Method: Dynamic Time Warping  DTW is a dynamic path search versus template  Can solve using Dynamic Programming

Marquette University Alternative: Sequential modeling Use a Markov Chain (state machine) S1S2S3 Data State Distribution Models State Machine

Marquette University Markov Chains (discrete-time & state)  A Markov chain is a discrete-time discrete-state Markov Process. The likelihood of the current RV going to any new state is determined solely by the current state, called a transition probability  Note: since transition probabilities are fixed, there is also a time-invariance assumption. (Also false of course, but useful)

Marquette University Graphical representation  Markov chain parameters include  Transition probability values a ij  Initial state probabilities  1  2  3 a 11 a 22 a 33 a 13 a 23 a 12 S1S2S3 a 21 a 31 a 32

Marquette University Example: Weather Patterns  Probability of Rain, Clouds, or Sunshine modeled as a Markov chain: A = Note: A matrix of this form (square, row sum=1) is called a stochastic matrix.

Marquette University Two-step probabilities If it’s raining today, what’s the probability of it raining two days from now?  Need two-step probabilities. Answer = 0.7*0.7 + 0.2*0.4 + 0.1*0.1 =.58  Can also get these directly from A 2 : A 2 =

Marquette University Steady-state  The N-step probabilities can be gotten from A N, so A is sufficient to determine the likelihoods of all possible sequences.  What’s the limiting case? Does it matter if it was raining 1000 days ago? A 1000 =

Marquette University Probability of state sequence  The probability of any state sequence is given by:  Training: Learn the transition probabilities by keeping count of the state sequences in the training data.

Marquette University Weather classification  Using a Markov chain for classification:  Train one Markov chain model for each class ex: A weather transition matrix for each city; Milwaukee, Phoenix, and Miami  Given a sequence of state observations, identify which is the most likely city by choosing the model that gives the highest overall probability.

Marquette University Hidden states & HMMs  What if you can’t directly observe states?  But… there are measures/observations that relate to the probability of different states.  States hidden from view = Hidden Markov Model.

Marquette University General Case HMM s i : state i a ij : P(s i  s j ) o t : output at time t b j (o t ) : P (o t | s j ) Initial:  1  2  3 b 1 (o t )b 2 (o t ) b 3 (o t )b 4 (o t )

Marquette University Weather HMM  Extend Weather Markov Chain to HMM’s  Can’t see if it’s raining, cloudy, or sunny.  But, we can make some observations:  Humidity H  Temperature T  Pressure P  How do we calculate …  Probability of an observation sequence under a model  How do we learn …  State transition probabilities for unseen states  Observation probabilities in each state

Marquette University Observation models  How do we characterize these observations?  Discrete/categorical observations: Learn probability mass function directly.  Continuous observations: Assume a parametric model.  Our Example: Assume a Gaussian distribution  Need to estimate the mean and variance of the humidity, temperature and pressure for each state (9 means and 9 variances, for each city model)

Marquette University HMM classification  Using a HMM for classification:  Training: One HMM for each class  Transition matrix plus state means and variances (27 parameters) for each city  Classification: Given a sequence of observations:  Evaluate P(O|model) for each city (Much harder to compute for HMM than for Markov Chain)  Choose the model that gives the highest overall probability.

Marquette University Using for Speech Recognition a 22 a 33 a 44 a 24 a 34 a 23 a 12 a 45 a 35 a 13 S1 S2S3S4S5 Start StateEnd State b 2 ()b 3 ()b 4 () States represent beginning, middle, end of a phoneme Gaussian Mixture Model in each state

Marquette University Fundamental HMM Computations  Evaluation: Given a model and an observation sequence O = (o 1, o 2, …, o T ), compute P(O | ).  Alignment: Given and O, compute the ‘correct’ state sequence S = (s 1, s 2, …, s T ), such as S = argmax S { P (S |O, ) }.  Training: Given a group of observation sequences, find an estimate of, such as ML = argmax { P (O | ) }.

Marquette University Evaluation: Forward/Backward algorithm  Define  i (t) = P(o 1 o 2..o t, s t =i | )  Define  i (t) = P(o t+1 o t+2..o T | s t =i, ) Each of these can be implemented efficiently via dynamic programming recursions starting at t=1 (for  ) and t=T (for  ). By putting the forward & backward together:

Marquette University Forward Recursion 1.Initialization 2.Recursion 3.Termination

Marquette University Backward recursion 1.Initialization 2.Recursion 3.Termination

Marquette University Note: Computation improvement  Direct computation: P(O | ) = the sum of the observation probabilities for all possible state sequences = N T. Time complexity = O(T N T )  F/B algorithm: For each state at each time step do a maximization over all state values from the previous time step: Time Complexity = O(T N 2 )

Marquette University From  i (t) and  i (t) : One-State Occupancy probability Two-state Occupancy probability

Marquette University Alignment: Viterbi algorithm To find single most likely state sequence S, use Viterbi dynamic programming algorithm: 1.Initialization: 2.Recursion: 3.Termination:

Marquette University Training We need to learn the parameters of the model, given the training data. Possibilities include:  Maximum a Priori (MAP)  Maximum Likelihood (ML)  Minimum Error Rate

Marquette University Expectation Maximization Expectation Maximization(EM) can be used for ML estimation of parameters in the presence of hidden variables. Basic iterative process: 1.Compute the state sequence likelihoods given current parameters 2.Estimate new parameter values given the state sequence likelihoods.

Marquette University EM Training: Baum-Welch for Discrete Observations (e.g. VQ coded) Basic Idea: Using current and F/B equations, compute state occupation probabilities. Then, compute new values:

Marquette University  Update equations for Gaussian distributions:  GMMs are similar, but need to incorporate mixture likelihoods as well as state likelihoods

Marquette University Toy example: Genie and the urns  There are N urns in a nearby room; each contains many balls of M different colors.  A genie picks out a sequence of balls from the urns and shows you the result. Can you determine the sequence of urns they came from?  Model as HMM: N states, M outputs  probabilities of picking from an urn are state transitions  number of different colored balls in each urn makes up the probability mass function for each state.

Marquette University Working out the Genie example  There are three baskets of colored balls  Basket one: 10 blue and 10 red  Basket two: 15 green, 5 blue, and 5 red  Basket three: 10 green and 10 red  The genie chooses from baskets at random  25% chance of picking from basket one or two  50% chance of picking from basket three

Marquette University Genie Example Diagram

Marquette University Two Questions  Assume that the genie reports a sequence of two balls as {blue, red}.  Answer two questions:  What is the probability that a two ball sequence will be {blue, red}?  What is the most likely sequence of baskets to produce the sequence {blue, red}?

Marquette University Probability of {blue, red} for Specific Basket Sequence

Marquette University Probability of {blue,red} What is the total probability of {blue,red}?  Sum(matrix values)= 0.074375 What is the most likely sequence of baskets visited?  Argmax(matrix values) = {Basket 1, Basket 3}  Corresponding max likelihood = 0.03125

Marquette University Viterbi method Best path ends in state 3, coming previously from state 1.

Marquette University a 22 a 33 a 44 a 24 a 34 a 23 a 12 a 45 a 35 a 13 S1 S2S3S4S5 Start StateEnd State a 22 a 33 a 44 a 24 a 34 a 23 a 12 a 45 a 35 a 13 S1 S2S3S4S5 Start StateEnd State Composite Models  Training data is at sentence level, generally not annotated at sub-word (HMM model) level.  Need to be able to form composite models from a sequence of word or phoneme labels.

Marquette University Viterbi and Token Passing fd b a c d e c... Recognition Network Best Sentence bccd... Word Graph Viterbi Token Passing

Marquette University HMM Notation Discrete HMM Case:

Marquette University Continuous HMM Case:

Marquette University Multi-mixture, multi-observation case:

Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.

Similar presentations

Presentation on theme: "Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.

Similar presentations

Presentation on theme: "Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept."— Presentation transcript:

Similar presentations

About project

Feedback