1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.

Slides:



Advertisements
Similar presentations
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Advertisements

Building an ASR using HTK CS4706
Speech Recognition with Hidden Markov Models Winter 2011
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Angelo Dalli Department of Intelligent Computing Systems
Automatic Speech Recognition II  Hidden Markov Models  Neural Network.
Introduction to Hidden Markov Models
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Hidden Markov Models Theory By Johan Walters (SR 2003)
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Models in NLP
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
Sequential Modeling with the Hidden Markov Model Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Speech Recognition Training Continuous Density HMMs Lecture Based on:
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Fast Temporal State-Splitting for HMM Model Selection and Learning Sajid Siddiqi Geoffrey Gordon Andrew Moore.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
Automatic Continuous Speech Recognition Database speech text Scoring.
Introduction to Automatic Speech Recognition
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Ch10 HMM Model 10.1 Discrete-Time Markov Process 10.2 Hidden Markov Models 10.3 The three Basic Problems for HMMS and the solutions 10.4 Types of HMMS.
Isolated-Word Speech Recognition Using Hidden Markov Models
Speech Recognition with Hidden Markov Models Winter 2011
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
HMM - Part 2 The EM algorithm Continuous density HMM.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Training Tied-State Models Rita Singh and Bhiksha Raj.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Ch 5b: Discriminative Training (temporal model) Ilkka Aho.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
Albert Gatt Corpora and Statistical Methods. Acknowledgement Some of the examples in this lecture are taken from a tutorial on HMMs by Wolgang Maass.
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
Hidden Markov Models BMI/CS 576
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Statistical Models for Automatic Speech Recognition
Computational NeuroEngineering Lab
Statistical Models for Automatic Speech Recognition
Presentation transcript:

1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul Hosom Lecture Notes for April 24: HMMs for speech; review anatomy/framework of HMM; start Viterbi search

2 HMMs for Speech Speech is the output of an HMM; problem is to find most likely state sequence for a given observation of speech. Speech is divided into sequence of 10-msec frames, one frame per state transition (faster processing). Assume speech can be recognized using 10-msec chunks. time

3 HMMs for Speech

4 Each state can be associated with  sub-phoneme  phoneme  sub-word Usually, sub-phonemes or sub-words are used, to account for coarticulation (spectral dynamics). One HMM corresponds to one phoneme or word For each HMM, determine the most likely state sequence that results in the observed speech. Choose HMM with best match to observed speech. Given most likely HMM and state sequence, determine the corresponding phoneme and word sequence (simple).

5 HMMs for Speech Example of states for word model: kae t kae t state word model for “cat” 5-state word model for “cat” with null states

6 HMMs for Speech Example of states for word model: ae 1 ae tcl k t state word model for “cat” with null states Null states do not emit observations, and are entered and exited at the same time t. Theoretically, they are unnecessary. Practically, they can make implementation easier. States don’t have to correspond directly to phonemes

7 HMMs for Speech y eh s sil b sil (o 1 )·0.6·b sil (o 2 )·0.6·b sil (o 3 )·0.6·b sil (o 4 )·0.4·b y (o 5 )·0.3·b y (o 6 )·0.3·b y (o 7 )· Example of using HMM for word “yes” on an utterance: observation state o1o1 o2o2 o3o3 o4o4 o5o5 o6o6 o7o7 o8o8 o 29

8 HMMs for Speech n ow sil sil 0.6 b sil (o 1 )·0.6·b sil (o 2 )·0.6·b sil (o 3 )·0.4·b n (o 4 )·0.8·b ow (o 5 )·0.9·b ow (o 6 )· Example of using HMM for word “no” on same utterance: o1o1 o2o2 o3o3 o4o4 o5o5 o6o6 o7o7 o8o8 o 29

9 HMMs for Speech Because of coarticulation, states are sometimes made dependent on preceding and/or following phonemes (context dependent).  ae (monophone model)  k- ae +t(triphone model)  k- ae (diphone model)  ae +t(diphone model) Constructing words requires matching the contexts: “cat”: sil- k +ae k- ae +tae- t +sil

10 HMMs for Speech This permits several different models for each phoneme, depending on surrounding phonemes (context sensitive)  k- ae +t  p- ae +t  k- ae +p Probability of “illegal” state sequence is zero (never used) sil- k +ae p- ae +t Much larger number of states to train on… (50 vs. 125,000) 0.0

11 HMMs for Speech y eh 0.5 sil-y+eh y-eh+s Example of 3-state, triphone HMM (expand from previous):

12 1-state monophone (context independent) 3-state monophone (context independent) 1-state triphone (context dependent) 3-state triphone (context dependent) HMMs for Speech y sil-y+eh what about a context-independent triphone?? sil-y+eh y1y1 y2y2 y3y

13 HMMs for Speech Typically, one HMM = one word or phoneme Join HMMs to form sequence of phonemes = word Join words to form sentences Use states at ends of HMM to simplify implementation k ae t null 1.0 s ae t null (i.t.) 1.0 (instantaneous transition)

14 HMMs for Speech Reminder of big picture:

15 HMMs for Speech Notes: Assume that speech observation is stationary for 1 frame If frame is small enough, and enough states are used, we can approximate dynamics of speech: The use of context-dependent states accounts (somewhat) for context-dependent nature of speech. s1s1 s2s2 s3s3 s4s4 s5s5 (frame size= 4 msec) /ay/

16 Prior segmentation of speech into categories not required before performing classification. This provides robustness over other methods that first segment and then classify, because any attempt to do prior segmentation will yield errors. As we move through an HMM to determine most likely sequence, we get segmentation. First-order and independence assumptions correct for some phenomena, but not for speech. But math is easier. HMMs for Speech

17 HMMs for Word Recognition Different Topologies are Possible: “standard” “short phoneme” “left-to-right” A1A1 A2A2 A3A A1A1 A2A2 A3A A1A1 A2A2 A3A A4A4 A5A

18 Anatomy of an HMM HMMs for speech: first-order HMM one HMM per phoneme or word 3 states per phoneme-level HMM, more for word-level HMM sequential series of states, each with self-loop link HMMs together to form words and sentences GMM: many Gaussian components per state (16) context-dependent HMMs: HMMs can be linked together only if their contexts correspond

19 Anatomy of an HMM HMMs for speech: speech signal divided into 10-msec quanta 1 HMM state per 10-msec quantum (frame) use self-loop for speech units that require more than N states trace through an HMM to determine likelihood of utterance and state sequence.

20 Anatomy of an HMM Diagram of one HMM /y/ in context of preceding silence, followed by /eh/ sil-y+eh  11  11 c 11  12  12 c 12  13  13 c 13  21  21 c 21  22  22 c 22  23  23 c 23  31  31 c 31  32  32 c 32  33  33 c 33 vector: matrix: scalar:

21 Framework for HMMs N = number of states 3 per phoneme, >3 per word S = states {S 1, S 2, S 3, …, S N } even though any state can output (any) observation, associate most likely output with state name. Often use context-dependent phonetic states (triphones): {sil-y+eh y-eh+s eh-s+sil …} T = final time of output t = {1, 2, … T} O = observations {o 1 o 2 … o T } actual output generated by HMM; features (LPC, MFCC, PLP, etc) of a speech signal

22 Framework for HMMs M = number of observation symbols per state = number of codewords for discrete HMM = “infinite” for continuous HMM v = symbols {v 1 v 2 … v M } “codebook indices” generated by discrete (VQ) HMM. No direct correspondence for continuous HMM; output of continuous HMM is sequence of observations {speech vector 1, speech vector 2, …} A = matrix of transition probabilities {a ij } a ij = P(q t =j | q t-1 =i) ergodic HMM: all a ij > 0 B = set of parameters for determining probabilities b j (o t ) b j (o t ) = P(o t = v k | q t = j)(discrete: codebook) = P(o t | q t = j)(continuous: GMM)

23 Framework for HMMs  = initial state distribution {  i }  i = P(q 1 = i) = entire model = (A, B,  )

24 Framework for HMMs Example: “hi” sil-h + ayh-ay + sil observed features: o 1 = {0.8} o 2 = {0.8} o 3 = {0.2} what is probability of O and the state sequence: {sil-h+ayh-ay+silh-ay+sil} {122}

25 P =  1 b 1 (o 1 ) a 12 b 2 (o 2 ) a 22 b 2 (o 3 ) P = 1.0 · 0.76 · 0.7 · 0.27 · 0.4 · 0.82 P = Framework for HMMs o 1 =0.8o 2 =0.8o 3 = q1q1 q2q2 q2q2 Example: “hi”

26 Framework for HMMs What is probability of an observation sequence and state sequence, given the model? P(O, q | ) = P(O | q, ) P(q | ) What is the “best” valid observation sequence from time 1 to time T, given the model? At every time t, can connect to up to N states  There are up to N T possible state sequences (for one second of speech with 3 states, N T = sequences) infeasible!!

27 Viterbi Search: Formula Use inductive procedure Best sequence defined as: First iteration (t=1): Question 1: What is best score along a single path, up to time t, ending in state i?

28 Viterbi Search: Formula Second iteration (t=2)

29 Viterbi Search: Formula Second iteration (t=2) (continued…) P(o 2 ) independent of o 1 and q 1; P(q 2 ) independent of o 1

30 Viterbi Search: Formula In general, for any value of t: Best path from {1, 2, … t} is not dependent on future times {t+1, t+2, … T} (from definition) Best path from {1, 2, … t} is not necessarily the same as the best path from {1, 2, … (t-1)} concatenated with the best path {(t-1) t}

31 Viterbi Search: Formula Keep in memory only  t-1 (i) for all i. For each time t and state j, need (N multiply and compare) + (1 multiply) For each time t, need N * ((N multiply and compare) + (1 multiply)) To find best path, need O( N 2 T ) operations. This is much better than N T possible paths, especially for large T!

32