Hidden Markov Models Fundamentals and applications to bioinformatics.

Slides:



Advertisements
Similar presentations
Hidden Markov Models (HMM) Rabiner’s Paper
Advertisements

Hidden Markov Model in Automatic Speech Recognition Z. Fodroczi Pazmany Peter Catholic. Univ.
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
Angelo Dalli Department of Intelligent Computing Systems
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Hidden Markov Model 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction – Markov Chain – Hidden Markov Model (HMM) Formal Definition of HMM & Problems Estimate.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.
Introduction to Hidden Markov Models
Tutorial on Hidden Markov Models.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Statistical NLP: Lecture 11
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Hidden Markov Models in NLP
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Apaydin slides with a several modifications and additions by Christoph Eick.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
1 Probabilistic Reasoning Over Time (Especially for HMM and Kalman filter ) December 1 th, 2004 SeongHun Lee InHo Park Yang Ming.
Chapter 4: Stochastic Processes Poisson Processes and Markov Chains
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Doug Downey, adapted from Bryan Pardo,Northwestern University
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
Ch10 HMM Model 10.1 Discrete-Time Markov Process 10.2 Hidden Markov Models 10.3 The three Basic Problems for HMMS and the solutions 10.4 Types of HMMS.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
7-Speech Recognition Speech Recognition Concepts
HMM - Basics.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Stochastic Processes and Transition Probabilities D Nagesh Kumar, IISc Water Resources Planning and Management: M6L5 Stochastic Optimization.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, March 14, 2013 Session 8: Sequence Labeling This work is licensed under.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Other Models for Time Series. The Hidden Markov Model (HMM)
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
Hidden Markov Models BMI/CS 576
V5 Stochastic Processes
Hidden Markov Models Part 2: Algorithms
Three classic HMM problems
Hidden Markov Model LR Rabiner
4.0 More about Hidden Markov Models
CONTEXT DEPENDENT CLASSIFICATION
Algorithms of POS Tagging
Introduction to HMM (cont)
Hidden Markov Models By Manish Shrivastava.
Presentation transcript:

Hidden Markov Models Fundamentals and applications to bioinformatics.

Markov Chains Given a finite discrete set S of possible states, a Markov chain process occupies one of these states at each unit of time. The process either stays in the same state or moves to some other state in S. This occurs in a stochastic way, rather than in a deterministic one. The process is memoryless and time homogeneous.

Transition Matrix Let S={S 1, S 2, S 3 }. A Markov Chain is described by a table of transition probabilities such as the following: S1S1 S2S2 S3S3 S1S1 010 S2S2 1/32/30 S3S3 1/21/31/6 S1S1 S3S3 S2S2 1 1/32/3 1/6 1/3 1/2

A simple example Consider a 3-state Markov model of the weather. We assume that once a day the weather is observed as being one of the following: rainy or snowy, cloudy, sunny. We postulate that on day t, weather is characterized by a single one of the three states above, and give ourselves a transition probability matrix A given by:

- 2 - Given that the weather on day 1 is sunny, what is the probability that the weather for the next 7 days will be “sun-sun-rain-rain-sun- cloudy-sun”?

- 3 - Given that the model is in a known state, what is the probability it stays in that state for exactly d days? The answer is Thus the expected number of consecutive days in the same state is So the expected number of consecutive sunny days, according to the model is 5.

Elements of an HMM What if each state does not correspond to an observable (physical) event? What if the observation is a probabilistic function of the state? An HMM is characterized by the following: 1) N, the number of states in the model. 2) M, the number of distinct observation symbols per state. 3) the state transition probability distribution where 4) the observation symbol probability distribution in state q j,, where b j (k) is the probability that the k-th observation symbol pops up at time t, given that the model is in state E j. 5) the initial state distribution

Three Basic Problems for HMMs 1) Given the observation sequence O = O 1 O 2 O 3 …O t, and a model m = (A, B, p), how do we efficiently compute P(O | m)? 2) Given the observation sequence O and a model m, how do we choose a corresponding state sequence Q = q 1 q 2 q 3 …q t which is optimal in some meaningful sense? 3) How do we adjust the model parameters to maximize P(O | m)?

Solution to Problem (1) Given an observed output sequence O, we have that This calculation involves the sum of N T multiplications, each being a multiplication of 2T terms. The total number of operations is on the order of 2T N T. Fortunately, there is a much more efficient algorithm, called the forward algorithm.

The Forward Algorithm It focuses on the calculation of the quantity which is the joint probability that the sequence of observations seen up to and including time t is O 1,…,O t, and that the state of the HMM at time t is E i. Once these quantities are known,

… continuation The calculation of the  (t, i)’s is by induction on t. From the formula we get

Backward Algorithm Another approach is the backward algorithm. Specifically, we calculate  (t, i) by the formula Again, by induction one can find the  (t,i)’s starting with the value t = T – 1, then for the value t = T – 2, and so on, eventually working back to t = 1.

Solution to Problem (2) Given an observed sequence O = O 1,…,O T of outputs, we want to compute efficiently a state sequence Q = q 1,…,q T that has the highest conditional probability given O. In other words, we want to find a Q that makes P[Q | O] maximal. There may be many Q’s that make P[Q | O] maximal. We give an algorithm to find one of them.

The Viterbi Algorithm It is divided in two steps. First it finds max Q P[Q | O], and then it backtracks to find a Q that realizes this maximum. First define, for arbitrary t and i,  (t,i) to be the maximum probability of all ways to end in state S i at time t and have observed sequence O 1 O 2 …O t. Then max Q P[Q and O] = max i  (T,i)

- 2 - But Since the denominator on the RHS does not depend on Q, we have We calculate the  (t,i)’s inductively.

- 3 - Finally, we recover the q i ’s as follows. Define and put: This is the last state in the state sequence desired. The remaining q t for t < T are found recursively by defining and putting

Solution to Problem (3) We are given a set of observed data from an HMM for which the topology is known. We wish to estimate the parameters in that HMM. We briefly describe the intuition behind the Baum- Welch method of parameter estimation. Assume that the alphabet M and the number of states N is fixed at the outset. The data we use to estimate the parameters constitute a set of observed sequences {O (d) }.

The Baum-Welch Algorithm We start by setting the parameters p i, a ij, b i (k) at some initial values. We then calculate, using these initial parameter values: 1) p i * = the expected proportion of times in state S i at the first time point, given {O (d) }.

) 3) where N ij is the random number of times q t (d) =S i and q t+1 (d) = S j for some d and t; N i is the random number of times q t (d) = S i for some d and t; and N i (k) equals the random number of times q t (d) = S i and it emits symbol k, for some d and t.

Upshot It can be shown that if = (p i, a jk, b i (k)) is substituted by * = (p i *, a jk *, b i * (k)) then P[{O (d) }| * ] P[{O (d) }| ], with equality holding if and only if * =. Thus successive iterations continually increase the probability of the data, given the model. Iterations continue until a local maximum of the probability is reached.