Ch-9: Markov Models Prepared by Qaiser Abbas ( )

Slides:



Advertisements
Similar presentations
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Advertisements

1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Hidden Markov Model Jianfeng Tang Old Dominion University 03/03/2004.
Hidden Markov Model 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction – Markov Chain – Hidden Markov Model (HMM) Formal Definition of HMM & Problems Estimate.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Introduction to Hidden Markov Models
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Cognitive Computer Vision
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Hidden Markov Models Adapted from Dr Catherine Sweeney-Reed’s slides.
Ch 9. Markov Models 고려대학교 자연어처리연구실 한 경 수
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
Hidden Markov Models in NLP
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Apaydin slides with a several modifications and additions by Christoph Eick.
INTRODUCTION TO Machine Learning 3rd Edition
Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
1 Probabilistic Reasoning Over Time (Especially for HMM and Kalman filter ) December 1 th, 2004 SeongHun Lee InHo Park Yang Ming.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Chapter 3 (part 3): Maximum-Likelihood and Bayesian Parameter Estimation Hidden Markov Model: Extension of Markov Chains All materials used in this course.
Fast Temporal State-Splitting for HMM Model Selection and Learning Sajid Siddiqi Geoffrey Gordon Andrew Moore.
Doug Downey, adapted from Bryan Pardo,Northwestern University
Hidden Markov Models 戴玉書
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Ch10 HMM Model 10.1 Discrete-Time Markov Process 10.2 Hidden Markov Models 10.3 The three Basic Problems for HMMS and the solutions 10.4 Types of HMMS.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
THE HIDDEN MARKOV MODEL (HMM)
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
NLP. Introduction to NLP Sequence of random variables that aren’t independent Examples –weather reports –text.
1 Hidden Markov Models Hsin-Min Wang Institute of Information Science, Academia Sinica References: 1.L. R. Rabiner and B. H. Juang,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013.
Dongfang Xu School of Information
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
CPS 170: Artificial Intelligence Markov processes and Hidden Markov Models (HMMs) Instructor: Vincent Conitzer.
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Other Models for Time Series. The Hidden Markov Model (HMM)
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
Hidden Markov Model LR Rabiner
CONTEXT DEPENDENT CLASSIFICATION
Presentation transcript:

Ch-9: Markov Models Prepared by Qaiser Abbas (07-0906)

Outline Markov Models Hidden MarKov Models (HMM) Three problems in HMM and their solutions

Credits and References Materials used in this representation are taken from following textbooks or web resources: 1."Foundations of Statistical Natural Language Processing" by Manning & Schütze. Chapter 9, “Markov Models” 2.“SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition”, by D. Jurafsky and J.H. Martin, updated chapters are available on author’s website: Chapter 9: “Automatic Speech Recognition” 3.“Spoken Language Processing - A Guide to Theory, Algorithm, and System Development”, by X. Huang, A. Acero, and H.W. Hon. Chapter 8:”Hidden Markov Models” Chapter 12, “Basic Search Algorithms” 4.Dr. Andrew W. Moore, Carnegie Melon University, http://www.cs.cmu.edu/~awm/tutorials 5.Larry Rabiner’s tutorial on HMM’s

A Markov System s2 s1 s3 Has N states, called s1, s2 .. sN There are discrete timesteps, t=0, t=1, … s2 s1 s3 N = 3 t=0

A Markov System s2 s1 s3 Has N states, called s1, s2 .. sN There are discrete timesteps, t=0, t=1, … On the t’th timestep the system is in exactly one of the available states. Call it qt Note: qt {s1, s2 .. sN } s2 Current State s1 s3 N = 3 t=0 qt=q0=s3

A Markov System s2 s1 s3 Has N states, called s1, s2 .. sN There are discrete timesteps, t=0, t=1, … On the t’th timestep the system is in exactly one of the available states. Call it qt Note: qt {s1, s2 .. sN } Between each timestep, the next state is chosen by random. Current State s2 s1 s3 N = 3 t=1 qt=q1=s2

A Markov System s2 s1 s3 Has N states, called s1, s2 .. sN P(qt+1=s1|qt=s2) = 1/2 P(qt+1=s2|qt=s2) = 1/2 P(qt+1=s3|qt=s2) = 0 A Markov System Has N states, called s1, s2 .. sN There are discrete timesteps, t=0, t=1, … On the t’th timestep the system is in exactly one of the available states. Call it qt Note: qt {s1, s2 .. sN } The current state determines the probability distribution for the next state. P(qt+1=s1|qt=s1) = 0 P(qt+1=s2|qt=s1) = 0 P(qt+1=s3|qt=s1) = 1 s2 1/2 2/3 1/2 s1 s3 1/3 N = 3 t=1 qt=q1=s2 1 P(qt+1=s1|qt=s3) = 1/3 P(qt+1=s2|qt=s3) = 2/3 P(qt+1=s3|qt=s3) = 0

P(qt+1=s1|qt=s2) = 1/2 P(qt+1=s2|qt=s2) = 1/2 P(qt+1=s3|qt=s2) = 0 Markov Property qt+1 is conditionally independent of { qt-1, qt-2, … q1, q0 } given qt. In other words: P(qt+1 = sj |qt = si ) = P(qt+1 = sj |qt = si ,any earlier history) The sequence of q is said to be a Markov chain ,or to have the Markov property if the next state depends only upon the current state and not on any past states P(qt+1=s1|qt=s1) = 0 P(qt+1=s2|qt=s1) = 0 P(qt+1=s3|qt=s1) = 1 s2 1/2 2/3 1/2 s1 s3 1/3 N = 3 t=1 qt=q1=s2 1 P(qt+1=s1|qt=s3) = 1/3 P(qt+1=s2|qt=s3) = 2/3 P(qt+1=s3|qt=s3) = 0

Transition Matrix Question: What is the probability of states sequence of

Example: A Simple Markov Model For Weather Prediction Any given day, the weather can be described as being in one of three states: State 1: snowy State 2: cloudy State 3: sunny transition matrix:

Question Given that the weather on day 1(t=1) is sunny (state 3), What is the probability that the weather for eight consecutive days is “sun-sun-sun-rain-rain-sun-cloudy-sun”? Solution: O = sun sun sun rain rain sun cloudy sun 3 3 3 1 1 3 2 3

From Markov To Hidden Markov The previous model assumes that each state can be uniquely associated with an observable event Once an observation is made, the state of the system is then trivially retrieved This model, however, is too restrictive to be of practical use for most realistic problems To make the model more flexible, we will assume that the outcomes or observations of the model are a probabilistic function of each state Each state can produce a number of outputs according to a probability distribution, and each distinct output can potentially be generated at any state These are known a Hidden Markov Models (HMM), because the state sequence is not directly observable, it can only be approximated from the sequence of observations produced by the system

Example: A Crazy Soft Drink Machine Suppose you have a crazy soft drink machine: it can be in two states, cola preferring (CP) and iced tea preferring (IP), but it switches between them randomly after each purchase, as shown below: NOT OBSERVABLE output possibility matrix Now, if , when you put in your coin, the machine always put out a cola if it was in the cola preferring state and an iced tea when it was in the iced tea preferring state, then we would have a visible Markov model. But instead, it only has a tendency to do this Three possible outputs( observations): cola, iced Tea, lemonade

Question What is the probability of seeing the output sequence {lem, ice_t} if the machine always starts off in the cola preferring state? Solution: We need to consider all paths that might be taken through the HMM, and then to sum over them. We know that the machine starts in state CP. There are then four possibilities to produce the observations: CP->CP->CP CP->CP-> IP CP->IP->CP CP->IP->IP So the total probability is: output possibility matrix

A Crazy Soft Drink Machine (Continued) cola Ice tea lemonade observations hidden states

General Form of an HMM HMM is specified by a five-tuple 1) Set of hidden states N: the number of states : the state at time t 2) Set of observation symbols M: the number of observation symbols 3) The initial state distribution 4) State transition probability distribution 5) Observation symbol probability distribution in state

General Form of an HMM (Continued) To sum up, a complete specification of an HMM includes: two constant-size parameters: N and M (representing the total number of states and the size of observation symbols), three sets of probability distribution: Two assumptions: 1.Markov assumption: represents the state sequence 2.Output independence assumption: represents the output sequence The output-independence assumption states that the probability that a particular symbol is emitted at time t depends only on the state st and is conditionally independent of the past observations

Three Basic Problems in HMM How to evaluate an HMM? Forward Algorithm 1.The Evaluation Problem –Given a model and a sequence of observations , what is the probability ; i.e., the probability of the model that generates the observations? 2.The Decoding Problem – Given a model and a sequence of observation , what is the most likely state sequence in the model that produces the observations? 3.The Learning Problem –Given a model and a set of observations, how can we adjust the model parameter to maximize the joint probability ? How to Decode an HMM? Viterbi Algorithm if we could solve the evaluation problem, we would have a way of evaluating how well a given HMM matches a given observation sequence. If we consider the case in which we are trying to choose among several competing models, the solution to problem 1 allows us to choose the model which best matches the observations. Problem 2 is the one in which we attempt to uncover the hidden part of the model, i.e., to find the “correct” state sequence. Typical uses might be to learn about the structure of the model, to find optimal state sequences for continuous speech recognition, or to get average statistics of individual states Problem3 is the one in which we attempt to optimize the model parameters so as to best describe how a given observation sequence comes about. The observation sequence used to adjust the model parameters is called a training sequence since it is used to “train” the HMM. How to Train an HMM? Baum-Welch Algorithm

How to Evaluate an HMM- A Straightforward Method To calculate the probability (likelihood) of the observation sequence , given the HMM , the most intuitive way is to sum up the probabilities of all possible state sequences: Applying Markov assumption: Applying output independent assumption: In other words, to compute p(x|Φ) ,we first enumerate all possible state sequences S of length T, that generate observation sequence X, and then sum all the probabilities.

How to Evaluate an HMM- A Straightforward Method (complexity) For any given state sequence, we start from initial state with probability or . We take a transition from to with probability and generate the observation with probability until we reach the last transition. In other words, to compute p(x|Φ) ,we first enumerate all possible state sequences S of length T, that generate observation sequence X, and then sum all the probabilities. It needs multiplications and additions. Total calculations: For N=5, T=100, it needs

How to Evaluate an HMM- The Forward Algorithm Define forward probability: is the probability that the HMM is in state having generated partial observation t=0, the cells contains exactly the initial probabilities The computation is in a time-synchronous fashion from left to right, where each cell for time t is completely computed before proceeding to time t+1 When the states in the last column have been computed, the sum of all probabilities in the final column is the probability of generating the observation sequence. The computation is done in a time- synchronous fashion from left to right

How to Evaluate an HMM- The Forward Algorithm It needs exactly N(N+1)(T-1)+N multiplications and N(N-1)(T-1) additions, so the complexity for this algorithm is O(N2T). For N=5, T=100, we need about 3000 computations for the forward algorithm, versus 1072 computations for the straightforward method.

How to Decode an HMM- The Viterbi Algorithm Instead of summing up probabilities from different paths coming to the same destination state, the Viterbi algorithm picks and remembers the best path. Define the best-path probability: is the probability of the most likely state sequence at time t, which has generated the observation (until time t) and ends in state i.

How to Decode an HMM- The Viterbi Algorithm The computation is done in a time-synchronous fashion from left to right. The complexity is also O(N2T).

HMM Training Using Baum-Welch Algorithm A Hidden Markov Model is a probabilistic model of the joint probability of a collection of random variables {O1,…OT, Q1,…QT}. The Ot variables are discrete observations and the Qt variables are “hidden” and discrete states. Under HMM, two conditional independence assumptions are 1. the tth hidden variable, given the (t-1)st hidden variable, is independent of previous variables, or: P(Qt | Qt-1, Ot-1, …, Q1, O1)= P(Qt | Qt-1). 2. the tth observation depends only on the tth state. P(Ot | Qt,Ot,…, Q1, O1)= P(Ot| Qt). EM algorithm for finding the MLE of the parameters of a HMM given a set of observed feature vectors. This algorithm is also known as the Baum-Welch algorithm. Qt is a discrete random variable with N possible values {1….N}. We further assume that the underlying “hidden” Markov chain defined by P(Qt | Qt-1 } is time-homogeneous (i.e., is independent of the time t). Therefore, we can represent P(Qt | Qt-1 } as a time-independent stochastic transition matrix A={aij}=p(Qt=j|Qt-1=i}. The special case of time t=1 is described by the initial state distribution πi=P(Q1=i). We say that we are in state j at time t if Qt = j. A particular sequence of states is described by q = (q1. . . qT ) where qt∈ {1…..N} is the state at time t. The observation is one of L possible observation symbols, Ot∈ {o1,….oL}.The probability of a particular observation vector at a particular time t for state j is described by: bj(ot) = p(Ot = ot|Qt = j). (B={bij} is an L by N matrix). A particular observation sequence O is described as O = (O1 = o1, , , OT = oT ). This is by far the most difficult of the three problems, because there is no known analytical method that maximizes the joint probability of the training data in a closed form.

Therefore, we can describe a HMM byλ = (A,B, π) Therefore, we can describe a HMM byλ = (A,B, π). Given an observation O, the Baum-Welch algorithm finds: that is, the HMM λ, that maximizes the probability of the observation O. The Baum-Welch algorithm Initialization: set with random initial conditions. The algorithm updates the parameters of λ iteratively until convergence, following the procedure below. The forward procedure: We define: αi(t) = p(O1 = o1, , ,Ot = ot, Qt = i| λ), which is the probability of seeing the partial sequence o1, , , ot and ending up in state i at time t. We can efficiently calculate αi(t) recursively as: The backward procedure: This is the probability of the ending partial sequence ot+1, , , oT given that we started at state i, at time t. We can efficiently calculate βi(t) as: using α and β, we can calculate the following variables:

having γ and ξ , one can define update rules as follows:

Toolkits for HMM Hidden Markov Model Toolkit (HTK) http://htk.eng.cam.ac.uk/ Hidden Markov Model (HMM) Toolbox for Matlab http://www.cs.ubc.ca/~murphyk/Software/HMM/hmm.html