EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31, 2003

Questions to be Answered
What is a Hidden Markov Model? How do HMMs work? How are HMMs applied to automatic speech recognition? What are the strengths/weaknesses of HMMs?

What is an HMM? A Hidden Markov Model is a piecewise stationary model of a nonstationary signal. Model parameters: states -- represents domain of a stationary signal interstate connections -- defines model architecture pdf estimates (for each state) Discrete -- codebooks Continuous -- mean, covariance matrices

HMM Depiction

PDF Estimation Discrete Continuous
Codebook of feature space cluster centers Probability for each codebook entry Continuous Gaussian mixtures (mean, covariance, mixture weights) Discriminative estimates (neural networks)

How do HMMs Work? Three fundamental issues
Training: Baum-Welch algorithm Scoring (evaluation): Forward algorithm Optimal path: Viterbi algorithm Complete implementation details: “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”, L. R. Rabiner, IEEE Proceedings, Feb 1989

HMM Training Baum-Welch algorithm
Iterative procedure (on-line or batch mode) Guaranteed to increase model accuracy after each iteration Estimation may be model-based (ML) or discriminative (MMI)

HMM Evaluation Forward algorithm
Calculates P(O|λ) for ALL valid state sequences Complexity: order N2T, ~5000 computations order 2T•NT (brute force), ~6E86 computations N states, T speech frames

Optimal Path Viterbi algorithm
Determines the single most-likely state sequence for a given model and observation sequence Dynamic programming solution Likelihood of Viterbi path can be used for evaluation instead of Forward algorithm

HMMs in ASR Piecewise stationary model of nonstationary signal Type # Models + - Word <1000 Coarticulation Scaling Phoneme 40 pdf estimation Biphone 1400 Triphone 40K TRADEOFF

Typical Implementations
Word models: 39 dimension feature vectors 3-15 states 1-50 Gaussian mixtures Diagonal covariance matrices First-order HMM Single-step state transitions Viterbi used for evaluation (speed)

Typical Implementations
Triphones Left- and right-context phoneme 3-5 states Up to 50 mixtures/state 40K models 39 dimension full covariance matrices Approx 15 billion parameters to estimate Approx 43,000 hours speech for training

Implementation Issues
Same number of states for each word model? Underflow of evaluation probabilities? Full/Diagonal covariance matrices?

HMM Limitations Piecewise stationary assumption iid assumption
Dipthongs Tonal languages Phonetic information in transitions iid assumption Slow articulators Temporal information No modeling beyond 100 ms time frame Data intensive

Download Slides

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture

Similar presentations

Presentation on theme: "EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture

Similar presentations

Presentation on theme: "EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture"— Presentation transcript:

Similar presentations

About project

Feedback