Hidden Markov Models Lecture 5, Tuesday April 15, 2003.

Slides:



Advertisements
Similar presentations
Hidden Markov Model in Biological Sequence Analysis – Part 2
Advertisements

Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Hidden Markov Model.
Hidden Markov Models Chapter 11. CG “islands” The dinucleotide “CG” is rare –C in a “CG” often gets “methylated” and the resulting C then mutates to T.
Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
Hidden Markov Models Eine Einführung.
Hidden Markov Models.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Modified from:
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Model Most pages of the slides are from lecture notes from Prof. Serafim Batzoglou’s course in Stanford: CS 262: Computational Genomics (Winter.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
CpG islands in DNA sequences
Heuristic Local Alignerers 1.The basic indexing & extension technique 2.Indexing: techniques to improve sensitivity Pairs of Words, Patterns 3.Systems.
Lecture 6, Thursday April 17, 2003
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models Lecture 6, Thursday April 17, 2003.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.).
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Linear-Space Alignment. Linear-space alignment Using 2 columns of space, we can compute for k = 1…M, F(M/2, k), F r (M/2, N – k) PLUS the backpointers.
CpG islands in DNA sequences
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Time Warping Hidden Markov Models Lecture 2, Thursday April 3, 2003.
Hidden Markov Models 1 2 K … x1 x2 x3 xK.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Hidden Markov Models.
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Sequence Alignment Cont’d. CS262 Lecture 4, Win06, Batzoglou Indexing-based local alignment (BLAST- Basic Local Alignment Search Tool) 1.SEED Construct.
CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
HMM Hidden Markov Model Hidden Markov Model. CpG islands CpG islands In human genome, CG dinucleotides are relatively rare In human genome, CG dinucleotides.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
CS5263 Bioinformatics Lecture 11: Markov Chain and Hidden Markov Models.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
CS5263 Bioinformatics Lecture 12: Hidden Markov Models and applications.
CS5263 Bioinformatics Lecture 10: Markov Chain and Hidden Markov Models.
Hidden Markov Models CBB 231 / COMPSCI 261 part 2.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
1 DNA Analysis Part II Amir Golnabi ENGS 112 Spring 2008.
Eric Xing © Eric CMU, Machine Learning Structured Models: Hidden Markov Models versus Conditional Random Fields Eric Xing Lecture 13,
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Hidden Markov Models – Concepts 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models BMI/CS 576
Hidden Markov Model ..
Hidden Markov Model Lecture #6
Presentation transcript:

Hidden Markov Models Lecture 5, Tuesday April 15, 2003

Definition of a hidden Markov model Definition: A hidden Markov model (HMM) Alphabet  = { b 1, b 2, …, b M } Set of states Q = { 1,..., K } Transition probabilities between any two states a ij = transition prob from state i to state j a i1 + … + a iK = 1, for all states i = 1…K Start probabilities a 0i a 01 + … + a 0K = 1 Emission probabilities within each state e i (b) = P( x i = b |  i = k) e i (b 1 ) + … + e i (b M ) = 1, for all states i = 1…K K 1 … 2

The three main questions on HMMs 1.Evaluation GIVEN a HMM M, and a sequence x, FIND Prob[ x | M ] 2.Decoding GIVENa HMM M, and a sequence x, FINDthe sequence  of states that maximizes P[ x,  | M ] 3.Learning GIVENa HMM M, with unspecified transition/emission probs., and a sequence x, FINDparameters  = (e i (.), a ij ) that maximize P[ x |  ]

Today Decoding Evaluation

Problem 1: Decoding Find the best parse of a sequence

Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We can use dynamic programming! Let V k (i) = max {  1,…,i-1} P[x 1 …x i-1,  1, …,  i-1, x i,  i = k] = Probability of most likely sequence of states ending at state  i = k 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2

Decoding – main idea Given that for all states k, and for a fixed position i, V k (i) = max {  1,…,i-1} P[x 1 …x i-1,  1, …,  i-1, x i,  i = k] What is V k (i+1)? From definition, V l (i+1) = max {  1,…,i} P[ x 1 …x i,  1, …,  i, x i+1,  i+1 = l ] = max {  1,…,i} P(x i+1,  i+1 = l | x 1 …x i,  1,…,  i ) P[x 1 …x i,  1,…,  i ] = max {  1,…,i} P(x i+1,  i+1 = l |  i ) P[x 1 …x i-1,  1, …,  i-1, x i,  i ] = max k P(x i+1,  i+1 = l |  i = k) max {  1,…,i-1} P[x 1 …x i-1,  1,…,  i-1, x i,  i =k] = e l (x i+1 ) max k a kl V k (i)

The Viterbi Algorithm Input: x = x 1 ……x N Initialization: V 0 (0) = 1(0 is the imaginary first position) V k (0) = 0, for all k > 0 Iteration: V j (i) = e j (x i )  max k a kj V k (i-1) Ptr j (i) = argmax k a kj V k (i-1) Termination: P(x,  *) = max k V k (N) Traceback:  N * = argmax k V k (N)  i-1 * = Ptr  i (i)

The Viterbi Algorithm Similar to “aligning” a set of states to a sequence Time: O(K 2 N) Space: O(KN) x 1 x 2 x 3 ………………………………………..x N State 1 2 K V j (i)

Viterbi Algorithm – a practical detail Underflows are a significant problem P[ x 1,…., x i,  1, …,  i ] = a 0  1 a  1  2 ……a  i e  1 (x 1 )……e  i (x i ) These numbers become extremely small – underflow Solution: Take the logs of all values V l (i) = log e k (x i ) + max k [ V k (i-1) + log a kl ]

Example Let x be a sequence with a portion of ~ 1/6 6’s, followed by a portion of ~ ½ 6’s… x = … … Then, it is not hard to show that optimal parse is (exercise): FFF…………………...F LLL………………………...L 6 nucleotides “123456” parsed as F, contribute.95 6  (1/6) 6 = 1.6  parsed as L, contribute.95 6  (1/2) 1  (1/10) 5 = 0.4  “162636” parsed as F, contribute.95 6  (1/6) 6 = 1.6  parsed as L, contribute.95 6  (1/2) 3  (1/10) 3 = 9.0  10 -5

Problem 2: Evaluation Find the likelihood a sequence is generated by the model

Generating a sequence by the model Given a HMM, we can generate a sequence of length n as follows: 1.Start at state  1 according to prob a 0  1 2.Emit letter x 1 according to prob e  1 (x 1 ) 3.Go to state  2 according to prob a  1  2 4.… until emitting x n 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xnxn 2 1 K 2 0 e 2 (x 1 ) a 02

A couple of questions Given a sequence x, What is the probability that x was generated by the model? Given a position i, what is the most likely state that emitted x i ? Example: the dishonest casino Say x = Most likely path:  = FF……F However: marked letters more likely to be L than unmarked letters

Evaluation We will develop algorithms that allow us to compute: P(x)Probability of x given the model P(x i …x j )Probability of a substring of x given the model P(  I = k | x)Probability that the i th state is k, given x A more refined measure of which states x may be in

The Forward Algorithm We want to calculate P(x) = probability of x, given the HMM Sum over all possible ways of generating x: P(x) =   P(x,  ) =   P(x |  ) P(  ) To avoid summing over an exponential number of paths , define f k (i) = P(x 1 …x i,  i = k) (the forward probability)

The Forward Algorithm – derivation Define the forward probability: f l (i) = P(x 1 …x i,  i = l) =   1…  i-1 P(x 1 …x i-1,  1,…,  i-1,  i = l) e l (x i ) =  k   1…  i-2 P(x 1 …x i-1,  1,…,  i-2,  i-1 = k) a kl e l (x i ) = e l (x i )  k f k (i-1) a kl

The Forward Algorithm We can compute f k (i) for all k, i, using dynamic programming! Initialization: f 0 (0) = 1 f k (0) = 0, for all k > 0 Iteration: f l (i) = e l (x i )  k f k (i-1) a kl Termination: P(x) =  k f k (N) a k0 Where, a k0 is the probability that the terminating state is k (usually = a 0k )

Relation between Forward and Viterbi VITERBI Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Iteration: V j (i) = e j (x i ) max k V k (i-1) a kj Termination: P(x,  *) = max k V k (N) FORWARD Initialization: f 0 (0) = 1 f k (0) = 0, for all k > 0 Iteration: f l (i) = e l (x i )  k f k (i-1) a kl Termination: P(x) =  k f k (N) a k0

Motivation for the Backward Algorithm We want to compute P(  i = k | x), the probability distribution on the i th position, given x We start by computing P(  i = k, x) = P(x 1 …x i,  i = k, x i+1 …x N ) = P(x 1 …x i,  i = k) P(x i+1 …x N | x 1 …x i,  i = k) = P(x 1 …x i,  i = k) P(x i+1 …x N |  i = k) Forward, f k (i)Backward, b k (i)

The Backward Algorithm – derivation Define the backward probability: b k (i) = P(x i+1 …x N |  i = k) =   i+1…  N P(x i+1,x i+2, …, x N,  i+1, …,  N |  i = k) =  l   i+1…  N P(x i+1,x i+2, …, x N,  i+1 = l,  i+2, …,  N |  i = k) =  l e l (x i+1 ) a kl   i+1…  N P(x i+2, …, x N,  i+2, …,  N |  i+1 = l) =  l e l (x i+1 ) a kl b l (i+1)

The Backward Algorithm We can compute b k (i) for all k, i, using dynamic programming Initialization: b k (N) = a k0, for all k Iteration: b k (i) =  l e l (x i+1 ) a kl b l (i+1) Termination: P(x) =  l a 0l e l (x 1 ) b l (1)

Computational Complexity What is the running time, and space required, for Forward, and Backward? Time: O(K 2 N) Space: O(KN) Useful implementation technique to avoid underflows Viterbi: sum of logs Forward/Backward: rescaling at each position by multiplying by a constant

Posterior Decoding We can now calculate f k (i) b k (i) P(  i = k | x) = ––––––– P(x) Then, we can ask What is the most likely state at position i of sequence x: Define  ^ by Posterior Decoding:  ^ i = argmax k P(  i = k | x)

Posterior Decoding For each state,  Posterior Decoding gives us a curve of likelihood of state for each position  That is sometimes more informative than Viterbi path  * Posterior Decoding may give an invalid sequence of states  Why?

Maximum Weight Trace Another approach is to find a sequence of states under some constraint, and maximizing expected accuracy of state assignments  A j (i) = max k such that Condition(k, j) A k (i-1) + P(  i = j | x ) We will revisit this notion again

A+C+G+T+ A-C-G-T- A modeling Example CpG islands in DNA sequences

Example: CpG Islands CpG nucleotides in the genome are frequently methylated (Write CpG not to confuse with CG base pair) C  methyl-C  T Methylation often suppressed around genes, promoters  CpG islands

Example: CpG Islands In CpG islands, CG is more frequent Other pairs (AA, AG, AT…) have different frequencies Question: Detect CpG islands computationally

A model of CpG Islands – (1) Architecture A+C+G+T+ A-C-G-T- CpG Island Not CpG Island