HIDDEN MARKOV MODELS Prof. Navneet Goyal Department of Computer Science BITS, Pilani Presentation based on: & on presentation on HMM by Jianfeng Tang Old.

Slides:



Advertisements
Similar presentations
Angelo Dalli Department of Intelligent Computing Systems
Advertisements

1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Automatic Speech Recognition II  Hidden Markov Models  Neural Network.
Machine Learning Hidden Markov Model Darshana Pathak University of North Carolina at Chapel Hill Research Seminar – November 14, 2012.
Hidden Markov Model Jianfeng Tang Old Dominion University 03/03/2004.
Hidden Markov Model 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction – Markov Chain – Hidden Markov Model (HMM) Formal Definition of HMM & Problems Estimate.
Introduction to Hidden Markov Models
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Cognitive Computer Vision
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
數據分析 David Shiuan Department of Life Science Institute of Biotechnology Interdisciplinary Program of Bioinformatics National Dong Hwa University.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Statistical NLP: Lecture 11
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
Hidden Markov Models in NLP
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
Apaydin slides with a several modifications and additions by Christoph Eick.
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
Albert Gatt Corpora and Statistical Methods Lecture 8.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Markov Analysis Chapter 15
Markov Analysis Chapter 16
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
1 Probabilistic Reasoning Over Time (Especially for HMM and Kalman filter ) December 1 th, 2004 SeongHun Lee InHo Park Yang Ming.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Doug Downey, adapted from Bryan Pardo,Northwestern University
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
To accompany Quantitative Analysis for Management, 8e by Render/Stair/Hanna Markov Analysis.
CHAPTER 15 SECTION 3 – 4 Hidden Markov Models. Terminology.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Ch10 HMM Model 10.1 Discrete-Time Markov Process 10.2 Hidden Markov Models 10.3 The three Basic Problems for HMMS and the solutions 10.4 Types of HMMS.
Isolated-Word Speech Recognition Using Hidden Markov Models
THE HIDDEN MARKOV MODEL (HMM)
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 Hidden Markov Model Presented by Qinmin Hu. 2 Outline Introduction Generating patterns Markov process Hidden Markov model Forward algorithm Viterbi.
Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Albert Gatt Corpora and Statistical Methods. Acknowledgement Some of the examples in this lecture are taken from a tutorial on HMMs by Wolgang Maass.
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
Hidden Markov Models (HMMs)
HCI/ComS 575X: Computational Perception
CONTEXT DEPENDENT CLASSIFICATION
CSCI 5582 Artificial Intelligence
Presentation transcript:

HIDDEN MARKOV MODELS Prof. Navneet Goyal Department of Computer Science BITS, Pilani Presentation based on: & on presentation on HMM by Jianfeng Tang Old Dominion University

Topics  Markov Models  Hidden Markov Models  HMM Problems

Markov Analysis  A technique that deals with the probabilities of future occurrences by analyzing presently known probabilities  Founder of the concept was A.A. Markov whose 1905 studies of sequence of experiments conducted in a chain were used to describe the principle of Brownian motion

Markov Analysis Applications:  Market share analysis  Bad debt prediction  Speech recognition  University enrollment prediction  …

Markov Analysis  Two competing manufacturers might have 40% & 60% market share today. May be in two months time, their market shares would become 45% & 55% respectively  Predicting these future states involve knowing the system’s probabilities of changing from one state to another  Matrix of transition probabilities  This is Markov Process

Markov Analysis 1.A finite number of possible states. 2. Probability of change remains the same over time. 3. Future state predictable from current state. 4. Size of system remains the same. 5. States collectively exhaustive. 6. States mutually exclusive.

The Markov Process Matrix of Transition New State Current State   P  

Markov Process Equations P 11 P 12 P 13...P 1n P 21 P 22 P 23...P 2n P m1... P mn Matrix of transition probabilities = P =  (i) = State probabilities = [  1  2  3 …  n ]  (i+1) =  (i)P

Predicting Future States Market Share of Grocery Stores AMERICAN FOOD STORE: 40% FOOD MART: 30% ATLAS FOODS: 30% ∏(1)=[0.4,0.3,0.3]

Predicting Future States

Will this trend continue in the future? Is it an equilibrium state? WILL Atlas food lose all of its market share?

Markov Analysis: Machine Operations P= State1: machine functioning correctly State2: machine functioning incorrectly P 11 = 0.8 = probability that the machine will be correctly functioning given it was correctly functioning last month ∏[2]=∏[1]P=[1,0]P=[0.8,0.2] ∏[3]=∏[2]P=[0.8,0.2]P=[0.66,0.34]

Machine Example: Periods to Reach Equilibrium Period State State 2

Equilibrium Equations

Markov System

At regularly spaced discrete times, the system undergoes a change of state (possibly back to same state) Discrete first order Markov Chain P[q t =S j |q t-1 =S i,q t-2 =S k,…..]= P[q t =S j |q t-1 =S i ] Consider only those processes in which the RHS is independent of time State transition probabilities are given by a ij = P[q t =S j |q t-1 =S i ] 1<=i,j<=N

Markov Models  A model of sequences of events where the probability of an event occurring depends upon the fact that a preceding event occurred. Observable states: 1, 2, …, N Observed sequences: O 1, O 2, …, O l, …, O T P(O l =j|O 1 =a,…,O l-1 =b,O l+1 =c,…)=P(O l =j|O 1 =a,…,O l- 1 =b)  Order n model  A Markov process is a process which moves from state to state depending (only) on the previous n states.

Markov Models  First Order Model (n=1) P(O l =j|O l-1 =a,O l-2 =b,…)=P(O l =j|O l-1 =a) The state of model depends only on its previous state. Components: States, initial probabilities & state transition probabilities

Markov Models  Consider a simple 3-state Markov model of weather  Assume that once a day (eg at noon), the weather is observed as one of the folowiing:  State 1 Rain or (snow)  State 2 Cloudy  State 3 Sunny  Transition Probabilities:  Given that on day 1 the weather is sunny  What is the probability that the weather for the next 7 days will be S S R R S C S?

Hidden Markov Model  HMMs allow you to estimate probabilities of unobserved events  E.g., in speech recognition, the observed data is the acoustic signal and the words are the hidden parameters

HMMs and their Usage  HMMs are very common in Computational Linguistics: Speech recognition (observed: acoustic signal, hidden: words) Handwriting recognition (observed: image, hidden: words) Machine translation (observed: foreign words, hidden: words in target language)

 Markov Model is used to predict what will come next based on previous observations.  However, sometimes, what we want to predict is not what we observed.  Example Someone trying to deduce the weather from a piece of seaweed For some reason, he can not access weather information (sun, cloud, rain) directly But he can know the dampness of a piece of seaweed (soggy, damp, dryish, dry) And the state of the seaweed is probabilistically related to the state of the weather Hidden Markov Models

 Hidden Markov Models are used to solve this kind of problems.  Hidden Markov Model is an extension of First Order Markov Model The “true” states are not observable directly (Hidden) Observable states are probabilistic functions of the hidden states The hidden system is First Order Markov

Hidden Markov Models  A Hidden Markov Model is consist of two sets of states and three sets of probabilities: hidden states : the (TRUE) states of a system that may be described by a Markov process (e.g. weather states in our example). observable states : the states of the process that are `visible‘ (e.g. dampness of the seaweed). Initial probabilities for hidden states Transition probabilities for hidden states Confusion probabilities from hidden states to observable states

Hidden Markov Models

Initial matrix Transition matrix Confusion matrix

The Trellis

Coin Toss Problem  Observed Sequence: HHTTTHTTH….H  How do we build an HMM to explain (model) the observed sequence? What the states in the model correspond to? How many states should be there in the model?  Single biased coin is tossed 2 state model Each state corresponds to a side of the coin ( H or T) Resultant Markov model is observable Only unknown in the value of the bias  2 biased coins are tossed 2 states in the model Each state corresponds to a different, biased coin being tossed Each state characterized by prob. dist. Of Hs & Ts  3 biased coins are tossed

Coin Toss Problem  Single biased coin is tossed 2 state model Each state corresponds to a side of the coin ( H or T) Resultant Markov model is observable Only unknown in the value of the bias

Coin Toss Problem  2 biased coins are tossed 2 states in the model Each state corresponds to a different, biased coin being tossed Each state characterized by prob. dist. Of Hs & Ts

Coin Toss Problem  3 biased coins are tossed

Coin Toss Problem  Which model best matches the actual observation?  1coin model has only one unknown parameter – the bias  2 coin model has 4 unknown parameters  3 coin model has 9 unknown parameters  Degrees of freedom  Larger HMMs more capable of modeling a series of coin tossing experiments??  Theoretically correct, but not practically  Practical considerations impose limitations on the size of the HMM  It might be the case that only 1 coin is being tossed

Urn & Coloured Balls Model

 State corresponds to a specific URN, and for which a (ball) color probability is defined for each state  Choice of URNS is dictated by the state transition matrix of the HMM

Elements of an HMM  N, number of states in the model, which are hidden  Physical significance attached to the states  Coin tossing experiment: Each state corresponds to a distinct biased coin  Urn ball model State corresponds to urns  Generally the states are interconnected  Ergodic Model

Elements of an HMM  M, number of distinct observation symbols per state  Coin tossing experiment: Hs or Ts  Urn ball model Colors of the balls

Elements of an HMM  A, state transition probability distribution a ij = P[q t+1 =S j |q t =S i ] 1<=i,j<=N

Elements of an HMM  Observation symbol probability distribution in state j B={bj(k)}, where

Elements of an HMM  The initial state distribution

HMM

HMM problems  HMMs are used to solve three kinds of problems Finding the probability of an observed sequence given a HMM (evaluation); Finding the sequence of hidden states that most probably generated an observed sequence (decoding). The third problem is generating a HMM given a sequence of observations (learning). –learning the probabilities from training data.

HMM problems

HMM Problems 1. Evaluation Problem:  We have a number of HMMs and a sequence of observations. We may want to know which HMM most probably generated the given sequence. Solution:  Computing the probability of the observed sequences for each HMM.  Choose the one produced highest probability  Can use Forward algorithm to reduce complexity.

HMM problems Pr(dry,damp,soggy | HMM) = Pr(dry,damp,soggy | sunny,sunny,sunny) + Pr(dry,damp,soggy | sunny,sunny,cloudy) + Pr(dry,damp,soggy | sunny,sunny,rainy) Pr(dry,damp,soggy | rainy,rainy,rainy)

HMM problems 2. Decoding Problem:  Given a particular HMM and an observation sequence, we want to know the most likely sequence of underlying hidden states that might have generated the observation sequence. Solution:  Computing the probability of the observed sequences for each possible sequence of underlying hidden states.  Choose the one produced highest probability  Can use Viterbi algorithm to reduce the complexity.

HMM Problems the most probable sequence of hidden states is the sequence that maximizes : Pr(dry,damp,soggy | sunny,sunny,sunny), Pr(dry,damp,soggy | sunny,sunny,cloudy), Pr(dry,damp,soggy | sunny,sunny,rainy),.... Pr(dry,damp,soggy | rainy,rainy,rainy)

HMM problems (cont.) 3. Learning Problem:  Estimate the probabilities of HMM from training data Solution:  Training with labeled data Transition probability P(a,b)=(number of transitions from a to b)/ total number of transitions of a Confusion probability P(a, o)=(number of symbol o occurrences in state a)/(number of all symbol occurrences in state a)  Training with unlabeled data Baum-Welch algorithm The basic idea  Random generate HMM at the beginning  Estimate new probability from the previous HMM until P(current HMM) – P( previous HMM) < e (a small number)

Designing HMM for an Isolated Word Recognizer  Vocabulary of V words  Each word to be modeled by a distinct HMM  For each word, we have a training data set of K occurrences of each word (spoken by one or more talkers)  Each occurrence of the word constitutes an observation sequence  Observations are some appropriate representations of the characteristics of the word

Designing HMM for an Isolated Word Recognizer  To do isolated word speech recognition:

Designing HMM for an Isolated Word Recognizer

HMM Application: Parsing a reference string into fields  Problem Parsing a reference string into fields (author, journal, volume, page, year, etc.)  Model as HMM Hidden states – fields (author, journal, volume, etc) and some special characters ( “,”, “and”, etc.) Observable states – words Probability matrixes --learning from training data  Reference parsing Using Viterbi algorithm to find the most possible sequence of hidden states for an observation sequence

Conclusions  HMM is used to model What we want to predict is not what we observed The underlying system can be model as first order Markov  HMM assumption The next state is independent of all states but its previous state The probability matrixes learned from samples are the actual probability matrixes. After learning, the probability matrixes will keep unchanged