Hidden Markov Models.

Slides:



Advertisements
Similar presentations
Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
Advertisements

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Hidden Markov Model.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
Hidden Markov Models Eine Einführung.
Hidden Markov Models.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Modified from:
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Model Most pages of the slides are from lecture notes from Prof. Serafim Batzoglou’s course in Stanford: CS 262: Computational Genomics (Winter.
Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
CpG islands in DNA sequences
… Hidden Markov Models Markov assumption: Transition model:
Heuristic Local Alignerers 1.The basic indexing & extension technique 2.Indexing: techniques to improve sensitivity Pairs of Words, Patterns 3.Systems.
Lecture 6, Thursday April 17, 2003
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models Lecture 6, Thursday April 17, 2003.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.).
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
CpG islands in DNA sequences
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Time Warping Hidden Markov Models Lecture 2, Thursday April 3, 2003.
Hidden Markov Models—Variants Conditional Random Fields 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models 1 2 K … x1 x2 x3 xK.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
HMM Hidden Markov Model Hidden Markov Model. CpG islands CpG islands In human genome, CG dinucleotides are relatively rare In human genome, CG dinucleotides.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
What if a new genome comes? We just sequenced the porcupine genome We know CpG islands play the same role in this genome However, we have no known CpG.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
CS5263 Bioinformatics Lecture 11: Markov Chain and Hidden Markov Models.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
CS5263 Bioinformatics Lecture 12: Hidden Markov Models and applications.
CS5263 Bioinformatics Lecture 10: Markov Chain and Hidden Markov Models.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Advanced Algorithms and Models for Computational Biology -- a machine learning approach Computational Genomics II: Sequence Modeling & Gene Finding with.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
1 DNA Analysis Part II Amir Golnabi ENGS 112 Spring 2008.
Eric Xing © Eric CMU, Machine Learning Structured Models: Hidden Markov Models versus Conditional Random Fields Eric Xing Lecture 13,
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Hidden Markov Models – Concepts 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
CSCI2950-C Lecture 2 September 11, Comparative Genomic Hybridization (CGH) Measuring Mutations in Cancer.
Hidden Markov Models BMI/CS 576
Hidden Markov Models - Training
Hidden Markov Model ..
CSE 5290: Algorithms for Bioinformatics Fall 2009
Presentation transcript:

Hidden Markov Models

The three main questions on HMMs Evaluation GIVEN a HMM M, and a sequence x, FIND Prob[ x | M ] Decoding GIVEN a HMM M, and a sequence x, FIND the sequence  of states that maximizes P[ x,  | M ] Learning GIVEN a HMM M, with unspecified transition/emission probs., and a sequence x, FIND parameters  = (ei(.), aij) that maximize P[ x |  ]

Decoding 1 GIVEN x = x1x2……xN 2 We want to find  = 1, ……, N, K … x1 x2 x3 xK GIVEN x = x1x2……xN We want to find  = 1, ……, N, such that P[ x,  ] is maximized * = argmax P[ x,  ] We can use dynamic programming! Let Vk(i) = max{1,…,i-1} P[x1…xi-1, 1, …, i-1, xi, i = k] = Probability of most likely sequence of states ending at state i = k

The Viterbi Algorithm x1 x2 x3 ………………………………………..xN State 1 2 Vj(i) K Similar to “aligning” a set of states to a sequence Time: O(K2N) Space: O(KN)

Evaluation We demonstrated algorithms that allow us to compute: P(x) Probability of x given the model P(xi…xj) Probability of a substring of x given the model P(i = k | x) Probability that the ith state is k, given x A more refined measure of which states x may be in

Motivation for the Backward Algorithm We want to compute P(i = k | x), the probability distribution on the ith position, given x We start by computing P(i = k, x) = P(x1…xi, i = k, xi+1…xN) = P(x1…xi, i = k) P(xi+1…xN | x1…xi, i = k) = P(x1…xi, i = k) P(xi+1…xN | i = k) Then, P(i = k | x) = P(i = k, x) / P(x) Forward, fk(i) Backward, bk(i)

The Backward Algorithm – derivation Define the backward probability: bk(i) = P(xi+1…xN | i = k) = i+1…N P(xi+1,xi+2, …, xN, i+1, …, N | i = k) = l i+1…N P(xi+1,xi+2, …, xN, i+1 = l, i+2, …, N | i = k) = l el(xi+1) akl i+1…N P(xi+2, …, xN, i+2, …, N | i+1 = l) = l el(xi+1) akl bl(i+1)

The Backward Algorithm We can compute bk(i) for all k, i, using dynamic programming Initialization: bk(N) = ak0, for all k Iteration: bk(i) = l el(xi+1) akl bl(i+1) Termination: P(x) = l a0l el(x1) bl(1)

Computational Complexity What is the running time, and space required, for Forward, and Backward? Time: O(K2N) Space: O(KN) Useful implementation technique to avoid underflows Viterbi: sum of logs Forward/Backward: rescaling at each position by multiplying by a constant

Viterbi, Forward, Backward Initialization: V0(0) = 1 Vk(0) = 0, for all k > 0 Iteration: Vl(i) = el(xi) maxk Vk(i-1) akl Termination: P(x, *) = maxk Vk(N) FORWARD Initialization: f0(0) = 1 fk(0) = 0, for all k > 0 Iteration: fl(i) = el(xi) k fk(i-1) akl Termination: P(x) = k fk(N) ak0 BACKWARD Initialization: bk(N) = ak0, for all k Iteration: bl(i) = k el(xi+1) akl bk(i+1) Termination: P(x) = k a0k ek(x1) bk(1)

Posterior Decoding We can now calculate fk(i) bk(i) P(i = k | x) = ––––––– P(x) Then, we can ask What is the most likely state at position i of sequence x: Define ^ by Posterior Decoding: ^i = argmaxk P(i = k | x)

Posterior Decoding For each state, Posterior Decoding gives us a curve of likelihood of state for each position That is sometimes more informative than Viterbi path * Posterior Decoding may give an invalid sequence of states Why?

Posterior Decoding =  {:[i] = k} P( | x) x1 x2 x3 …………………………………………… xN State 1 l P(i=l|x) k P(i = k | x) =  P( | x) 1(i = k) =  {:[i] = k} P( | x) 1() = 1, if  is true 0, otherwise

Posterior Decoding x1 x2 x3 …………………………………………… xN State 1 l k P(i=l|x) P(j=l’|x) k Example: How do we compute P(i = l, ji = l’ | x)? fl(i) bl(j) P(i = l, iI = l’ | x) = ––––––– P(x)

CpG islands in DNA sequences A modeling Example A+ C+ G+ T+ A- C- G- T- CpG islands in DNA sequences

Example: CpG Islands CpG nucleotides in the genome are frequently methylated (Write CpG not to confuse with CG base pair) C  methyl-C  T Methylation often suppressed around genes, promoters  CpG islands

Example: CpG Islands In CpG islands, CG is more frequent Other pairs (AA, AG, AT…) have different frequencies Question: Detect CpG islands computationally

A model of CpG Islands – (1) Architecture Not CpG Island

A model of CpG Islands – (2) Transitions How do we estimate the parameters of the model? Emission probabilities: 1/0 Transition probabilities within CpG islands Established from many known (experimentally verified) CpG islands (Training Set) Transition probabilities within other regions Established from many known non-CpG islands + A C G T .180 .274 .426 .120 .171 .368 .188 .161 .339 .375 .125 .079 .355 .384 .182 - A C G T .300 .205 .285 .210 .233 .298 .078 .302 .248 .246 .208 .177 .239 .292

Log Likehoods— Telling “Prediction” from “Random” Another way to see effects of transitions: Log likelihoods L(u, v) = log[ P(uv | + ) / P(uv | -) ] Given a region x = x1…xN A quick-&-dirty way to decide whether entire x is CpG P(x is CpG) > P(x is not CpG)  i L(xi, xi+1) > 0 A C G T -0.740 +0.419 +0.580 -0.803 -0.913 +0.302 +1.812 -0.685 -0.624 +0.461 +0.331 -0.730 -1.169 +0.573 +0.393 -0.679

A model of CpG Islands – (2) Transitions What about transitions between (+) and (-) states? They affect Avg. length of CpG island Avg. separation between two CpG islands 1-p Length distribution of region X: P[lX = 1] = 1-p P[lX = 2] = p(1-p) … P[lX= k] = pk(1-p) E[lX] = 1/(1-p) Geometric distribution, with mean 1/(1-p) X Y p q 1-q

A model of CpG Islands – (2) Transitions No reason to favor exiting/entering (+) and (-) regions at a particular nucleotide To determine transition probabilities between (+) and (-) states Estimate average length of a CpG island: lCPG = 1/(1-p)  p = 1 – 1/lCPG For each pair of (+) states k, l, let akl  p × akl For each (+) state k, (-) state l, let akl = (1-p)/4 (better: take frequency of l in the (-) regions into account) Do the same for (-) states A problem with this model: CpG islands don’t have exponential length distribution This is a defect of HMMs – compensated with ease of analysis & computation A+ C+ G+ T+ A- C- G- T- 1–p

Applications of the model Given a DNA region x, The Viterbi algorithm predicts locations of CpG islands Given a nucleotide xi, (say xi = A) The Viterbi parse tells whether xi is in a CpG island in the most likely general scenario The Forward/Backward algorithms can calculate P(xi is in CpG island) = P(i = A+ | x) Posterior Decoding can assign locally optimal predictions of CpG islands ^i = argmaxk P(i = k | x)

What if a new genome comes? We just sequenced the porcupine genome We know CpG islands play the same role in this genome However, we have no known CpG islands for porcupines We suspect the frequency and characteristics of CpG islands are quite different in porcupines How do we adjust the parameters in our model? LEARNING

Re-estimate the parameters of the model based on training data Problem 3: Learning Re-estimate the parameters of the model based on training data

Two learning scenarios Estimation when the “right answer” is known Examples: GIVEN: a genomic region x = x1…x1,000,000 where we have good (experimental) annotations of the CpG islands GIVEN: the casino player allows us to observe him one evening, as he changes dice and produces 10,000 rolls Estimation when the “right answer” is unknown GIVEN: the porcupine genome; we don’t know how frequent are the CpG islands there, neither do we know their composition GIVEN: 10,000 rolls of the casino player, but we don’t see when he changes dice QUESTION: Update the parameters  of the model to maximize P(x|)

1. When the right answer is known Given x = x1…xN for which the true  = 1…N is known, Define: Akl = # times kl transition occurs in  Ek(b) = # times state k in  emits b in x We can show that the maximum likelihood parameters  (maximize P(x|)) are: Akl Ek(b) akl = ––––– ek(b) = ––––––– i Aki c Ek(c)

1. When the right answer is known Intuition: When we know the underlying states, Best estimate is the average frequency of transitions & emissions that occur in the training data Drawback: Given little data, there may be overfitting: P(x|) is maximized, but  is unreasonable 0 probabilities – VERY BAD Example: Given 10 casino rolls, we observe x = 2, 1, 5, 6, 1, 2, 3, 6, 2, 3  = F, F, F, F, F, F, F, F, F, F Then: aFF = 1; aFL = 0 eF(1) = eF(3) = .2; eF(2) = .3; eF(4) = 0; eF(5) = eF(6) = .1

Pseudocounts Solution for small training sets: Add pseudocounts Akl = # times kl transition occurs in  + rkl Ek(b) = # times state k in  emits b in x + rk(b) rkl, rk(b) are pseudocounts representing our prior belief Larger pseudocounts  Strong priof belief Small pseudocounts ( < 1): just to avoid 0 probabilities

Pseudocounts r0F = r0L = rF0 = rL0 = 1; Example: dishonest casino We will observe player for one day, 600 rolls Reasonable pseudocounts: r0F = r0L = rF0 = rL0 = 1; rFL = rLF = rFF = rLL = 1; rF(1) = rF(2) = … = rF(6) = 20 (strong belief fair is fair) rL(1) = rL(2) = … = rL(6) = 5 (wait and see for loaded) Above #s pretty arbitrary – assigning priors is an art

2. When the right answer is unknown We don’t know the true Akl, Ek(b) Idea: We estimate our “best guess” on what Akl, Ek(b) are We update the parameters of the model, based on our guess We repeat