Hidden Markov Models: Decoding & Training Natural Language Processing CMSC 35100 April 24, 2003.

Slides:

Advertisements

Similar presentations

Building an ASR using HTK CS4706

Advertisements

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.

Hidden Markov Models Theory By Johan Walters (SR 2003)

1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.

Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.

INTRODUCTION TO Machine Learning 3rd Edition

Natural Language Processing - Speech Processing -

Application of HMMs: Speech recognition “Noisy channel” model of speech.

Speech Recognition. What makes speech recognition hard?

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

Probabilistic Pronunciation + N-gram Models CSPP Artificial Intelligence February 25, 2004.

Learning, Uncertainty, and Information Big Ideas November 8, 2004.

Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.

Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.

Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.

1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.

COMP 4060 Natural Language Processing Speech Processing.

Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Introduction to Automatic Speech Recognition

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.

1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.

Speech and Language Processing

7-Speech Recognition Speech Recognition Concepts

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

Speech, Perception, & AI Artificial Intelligence CMSC March 5, 2002.

IRCS/CCN Summer Workshop June 2003 Speech Recognition.

Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.

Entropy & Hidden Markov Models Natural Language Processing CMSC April 22, 2003.

Sequence Models With slides by me, Joshua Goodman, Fei Xia.

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

Speech, Perception, & AI Artificial Intelligence CMSC February 13, 2003.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.

Hidden Markov Models: Probabilistic Reasoning Over Time Natural Language Processing.

1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.

CS Statistical Machine learning Lecture 24

CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.

1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.

Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.

1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.

Hidden Markov Models: Probabilistic Reasoning Over Time Artificial Intelligence CMSC February 26, 2008.

DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION DIGITAL SPEECH PROCESSING HOMEWORK #1 DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION Date: Oct, Revised.

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,

Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

Hidden Markov Models: Probabilistic Reasoning Over Time Natural Language Processing CMSC February 22, 2005.

Probabilistic Pronunciation + N-gram Models CMSC Natural Language Processing April 15, 2003.

Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.

CS 224S / LINGUIST 285 Spoken Language Processing

Automatic Speech Recognition

Learning, Uncertainty, and Information: Learning Parameters

Uncertain Reasoning over Time

4.0 More about Hidden Markov Models

Speech Processing Speech Recognition

Lecture 10: Speech Recognition (II) October 28, 2004 Dan Jurafsky

LECTURE 15: REESTIMATION, EM AND MIXTURES

Presentation transcript:

Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003

Agenda Speech Recognition –Hidden Markov Models Uncertain observations Recognition: Viterbi, Stack/A* Training the model: Baum-Welch

Speech Recognition Model Question: Given signal, what words? Problem: uncertainty –Capture of sound by microphone, how phones produce sounds, which words make phones, etc Solution: Probabilistic model –P(words|signal) = – P(signal|words)P(words)/P(signal) –Idea: Maximize P(signal|words)*P(words) P(signal|words): acoustic model; P(words): lang model

Hidden Markov Models (HMMs) An HMM is: –1) A set of states: –2) A set of transition probabilities: Where aij is the probability of transition qi -> qj –3)Observation probabilities: The probability of observing ot in state i –4) An initial probability dist over states: The probability of starting in state i –5) A set of accepting states

Acoustic Model 3-state phone model for [m] –Use Hidden Markov Model (HMM) –Probability of sequence: sum of prob of paths OnsetMidEndFinal C1: 0.5 C2: 0.2 C3: 0.3 C3: 0.2 C4: 0.7 C5: 0.1 C4: 0.1 C6: 0.5 C6: 0.4 Transition probabilities Observation probabilities

Viterbi Algorithm Find BEST word sequence given signal –Best P(words|signal) –Take HMM & VQ sequence => word seq (prob) Dynamic programming solution –Record most probable path ending at a state i Then most probable path from i to end O(bMn)

Viterbi Code Function Viterbi(observations length T, state-graph) returns best-path Num-states<-num-of-states(state-graph) Create path prob matrix viterbi[num-states+2,T+2] Viterbi[0,0]<- 1.0 For each time step t from 0 to T do for each state s from 0 to num-states do for each transition s’ from s in state-graph new-score<-viterbi[s,t]*at[s,s’]*bs’(ot) if ((viterbi[s’,t+1]=0) || (viterbi[s’,t+1]<new-score)) then viterbi[s’,t+1] <- new-score back-pointer[s’,t+1]<-s Backtrace from highest prob state in final column of viterbi[] & return

Enhanced Decoding Viterbi problems: –Best phone sequence not necessarily most probable word sequence E.g. words with many pronunciations less probable –Dynamic programming invariant breaks on trigram Solution 1: –Multipass decoding: Phone decoding -> n-best lattice -> rescoring (e.g. tri)

Enhanced Decoding: A* Search for highest probability path –Use forward algorithm to compute acoustic match –Perform fast match to find next likely words Tree-structured lexicon matching phone sequence –Estimate path cost: Current cost + underestimate of total –Store in priority queue –Search best first

Modeling Sound, Redux Discrete VQ codebook values –Simple, but inadequate –Acoustics highly variable Gaussian pdfs over continuous values –Assume normally distributed observations Typically sum over multiple shared Gaussians –“Gaussian mixture models” –Trained with HMM model

Learning HMMs Issue: Where do the probabilities come from? Solution: Learn from data –Trains transition (aij) and emission (bj) probabilities Typically assume structure –Baum-Welch aka forward-backward algorithm Iteratively estimate counts of transitions/emitted Get estimated probabilities by forward comput’n –Divide probability mass over contributing paths

Forward Probability Where α is the forward probability, t is the time in utterance, i,j are states in the HMM, aij is the transition probability, bj(ot) is the probability of observing ot in state bj N is the final state, T is the last time, and 1 is the start state

Backward Probability Where β is the backward probability, t is the time in utterance, i,j are states in the HMM, aij is the transition probability, bj(ot) is the probability of observing ot in state bj N is the final state, T is the last time, and 1 is the start state

Re-estimating Estimate transitions from i->j Estimate observations in j

ASR Training Models to train: –Language model: typically tri-gram –Observation likelihoods: B –Transition probabilities: A –Pronunciation lexicon: sub-phone, word Training materials: –Speech files – word transcription –Large text corpus –Small phonetically transcribed speech corpus

Training Language model: –Uses large text corpus to train n-grams 500 M words Pronunciation model: –HMM state graph –Manual coding from dictionary Expand to triphone context and sub-phone models

HMM Training Training the observations: –E.g. Gaussian: set uniform initial mean/variance Train based on contents of small (e.g. 4hr) phonetically labeled speech set (e.g. Switchboard) Training A&B: –Forward-Backward algorithm training

Does it work? Yes: –99% on isolate single digits –95% on restricted short utterances (air travel) –80+% professional news broadcast No: –55% Conversational English –35% Conversational Mandarin –?? Noisy cocktail parties

Speech Synthesis Text to speech produces –Sequence of phones, phone duration, phone pitch Most common approach: –Concatentative synthesis Glue waveforms together Issue: Phones depend heavily on context –Diphone models: mid-point to mid-point Captures transitions, few enough contexts to collect (1-2K)

Speech Synthesis: Prosody Concatenation intelligible but unnatural Model duration and pitch variation –Could extract pitch contour directly –Common approach: TD-PSOLA Time-domain pitch synchronous overlap and add –Center frames around pitchmarks to next pitch period –Adjust prosody by combining frames at pitchmarks for desired pitch and duration –Increase pitch by shrinking distance b/t pitchmarks –Can be squeaky

Speech Recognition as Modern AI Draws on wide range of AI techniques –Knowledge representation & manipulation Optimal search: Viterbi decoding –Machine Learning Baum-Welch for HMMs Nearest neighbor & k-means clustering for signal id –Probabilistic reasoning/Bayes rule Manage uncertainty in signal, phone, word mapping Enables real world application