Sequential Modeling with the Hidden Markov Model Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg.

Slides:



Advertisements
Similar presentations
1 Gesture recognition Using HMMs and size functions.
Advertisements

Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Building an ASR using HTK CS4706
Speech Recognition with Hidden Markov Models Winter 2011
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Chapter 15 Probabilistic Reasoning over Time. Chapter 15, Sections 1-5 Outline Time and uncertainty Inference: ltering, prediction, smoothing Hidden Markov.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
2004/11/161 A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition LAWRENCE R. RABINER, FELLOW, IEEE Presented by: Chi-Chun.
Ch 9. Markov Models 고려대학교 자연어처리연구실 한 경 수
Statistical NLP: Lecture 11
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili University of Tehran.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
… Hidden Markov Models Markov assumption: Transition model:
Speech Recognition Training Continuous Density HMMs Lecture Based on:
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Hidden Markov Models David Meir Blei November 1, 1999.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Gaussian Mixture Model and the EM algorithm in Speech Recognition
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
7-Speech Recognition Speech Recognition Concepts
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Training Tied-State Models Rita Singh and Bhiksha Raj.
CS Statistical Machine learning Lecture 24
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
1 Hidden Markov Models (HMMs). 2 Definition Hidden Markov Model is a statistical model where the system being modeled is assumed to be a Markov process.
Albert Gatt Corpora and Statistical Methods. Acknowledgement Some of the examples in this lecture are taken from a tutorial on HMMs by Wolgang Maass.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, March 14, 2013 Session 8: Sequence Labeling This work is licensed under.
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Other Models for Time Series. The Hidden Markov Model (HMM)
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Learning, Uncertainty, and Information: Learning Parameters
Computational NeuroEngineering Lab
Hidden Markov Models Part 2: Algorithms
Hidden Markov Autoregressive Models
CSCI 5832 Natural Language Processing
Statistical Models for Automatic Speech Recognition
Hidden Markov Model LR Rabiner
Speech Processing Speech Recognition
LECTURE 15: REESTIMATION, EM AND MIXTURES
Hidden Markov Models (cont.) Markov Decision Processes
Presentation transcript:

Sequential Modeling with the Hidden Markov Model Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg

Markov Assumption If we can represent all of the information available in the present state, encoding the past is un-necessary. 1 The future is independent of the past given the present

Markov Assumption in Speech Word Sequences Phone Sequences Part of Speech Tags Syntactic constituents Phrase sequences Discourse Acts Intonation 2

Markov Chain The probability of a sequence can be decomposed into a probability of sequential events. 3 x1x1 x1x1 x2x2 x2x2 x3x3 x3x3

Hidden Markov model In a Hidden Markov Model the state sequence is unobserved. Only an observation sequence is available 4 q1q1 q2q2 q3q3 x1x1 x1x1 x2x2 x2x2 x3x3 x3x3

Hidden Markov model Observations are MFCC vectors States are phone labels Each state (phone) has an associated GMM modeling the MFCC likelihood 5 q1q1 q2q2 q3q3 x1x1 x1x1 x2x2 x2x2 x3x3 x3x3

Forward-Backwards Algorithm HMMs are trained by collecting and distributing information from observations to states. The Forward-Backwards algorithm is a specific example of EM. In the HMM topology (variable relationship), the training converges in one forward pass, and a backwards pass. –hence the name 6

Forwards Backwards Algorithm Forwards-Step: –Collect up from the observations to the states –Collect from left state to right state. “Collect” – update parameters to correctly model the observations –Observation collection will give a distribution over states, given the initial state –State collection will also give a distribution over states –the new q distribution will reflect the combination of these two 7 q1q1 q2q2 q3q3 x1x1 x1x1 x2x2 x2x2 x3x3 x3x3

Forwards Backwards Algorithm Backwards-Step: –Distribute down to the observations from the states –Collect from left state to right state. “Distribute” – update parameters to correctly model the observations –Observation distribute updates the state-observation relationship –State distribution updates the state-state transition matrix Forward-backwards can be shown to converge in one pass. 8 q1q1 q2q2 q3q3 x1x1 x1x1 x2x2 x2x2 x3x3 x3x3

Finite State Automata “Start” “Accept” States Epsilon Transitions Relationship to Regular Expressions Operations on FSA –Addition –Inversion –Node expansion –Determinization Weighted automata allow probabilities to be assigned to transitions 9

State transitions as FSA 10 /d/ /t/ /ey/ /ax/ /ae/ /dx/

Word FSA to phone FSA 11 /d/ /t/ /ey / /ax / /ae/ /dx/ MORE DATA /m/ /ao/ /r/

Word FSA to phone FSA 12 /d/ /t/ /ey / /ax / /ae/ /dx/ /m/ /ao/ /r/

Decoding a Hidden Markov Model Decoding is finding the most likely state sequence. How many state sequences are there in a HMM with N observations and k states? 13

Viterbi Decoding Dynamic Programming can make this a lot faster. Idea: Any optimal sequence between x 0 and x n must include the optimal sequence between x n and x n-1. –Based on the Markov Assumption. 14

Viterbi Decoding 15 Probability of most likely state sequence Recovering the the optimal sequence involves storing pointers as decisions are made.

Example (from Wikipedia) states = ('Rainy', 'Sunny') observations = ('walk', 'shop', 'clean') start_probability = {'Rainy': 0.6, 'Sunny': 0.4} transition_probability = { 'Rainy' : {'Rainy': 0.7, 'Sunny': 0.3}, 'Sunny' : {'Rainy': 0.4, 'Sunny': 0.6}, } emission_probability = { 'Rainy' : {'walk': 0.1, 'shop': 0.4, 'clean': 0.5}, 'Sunny' : {'walk': 0.6, 'shop': 0.3, 'clean': 0.1}, } 16 What is the most likely state sequence?

HMM Topology for Training Rather than having one GMM per phone, it is common for acoustic models to represent each phone as 3 triphones 17 S1 S3 S2 S4 S5 /r/

Flat Start In Flat Start training, GMM parameters are initialized to global means and variances. Viterbi is used to perform forced alignment between observations and phone sequence. –The phone sequence is derived from the lexical transcription and pronunciation model 18

Forced Alignment Given a phone sequence and observations, assign each observation to a phone. Uses –Identifying which observation belong to each phone label for later training –Getting time boundaries for phone or word labels. 19

Flat Start In Flat Start training, GMM parameters are initialized to global means and variances. Viterbi is used to perform forced alignment between observations and phone sequence. –The phone sequence is derived from the lexical transcription and pronunciation model After alignment, retrain Acoustic Models, and repeat. 20

What about silence? 21 If there is no “silence” state, the silent frames will be assigned to either the /d/ or the /ax/ This leads to worse acoustic models. A solution: Explicit training of silence models, /sp/ –Allowing /sp/ transitions at word boundaries /d/ /ey/ /dx/ /ax/

Next Class Pronunciation Modeling Reading: J&M Chapter 2, Section10.5.3, 11.1,