Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.

Similar presentations


Presentation on theme: "Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where."— Presentation transcript:

1 Hidden Markov Models

2 Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where we have good (experimental) annotations of the CpG islands GIVEN:the casino player allows us to observe him one evening, as he changes dice and produces 10,000 rolls 2.Estimation when the “right answer” is unknown Examples: GIVEN:the porcupine genome; we don’t know how frequent are the CpG islands there, neither do we know their composition GIVEN: 10,000 rolls of the casino player, but we don’t see when he changes dice QUESTION:Update the parameters  of the model to maximize P(x|  )

3 1.When the right answer is known Given x = x 1 …x N for which the true  =  1 …  N is known, Define: A kl = # times k  l transition occurs in  E k (b) = # times state k in  emits b in x We can show that the maximum likelihood parameters  are: A kl E k (b) a kl = ––––– e k (b) = –––––––  i A ki  c E k (c)

4 2.When the right answer is unknown We don’t know the true A kl, E k (b) Idea: We estimate our “best guess” on what A kl, E k (b) are We update the parameters of the model, based on our guess We repeat

5 2.When the right answer is unknown Starting with our best guess of a model M, parameters  : Given x = x 1 …x N for which the true  =  1 …  N is unknown, We can get to a provably more likely parameter set  Principle: EXPECTATION MAXIMIZATION 1.Estimate A kl, E k (b) in the training data 2.Update  according to A kl, E k (b) 3.Repeat 1 & 2, until convergence

6 Estimating new parameters To estimate A kl : At each position i of sequence x, Find probability transition k  l is used: P(  i = k,  i+1 = l | x) = [1/P(x)]  P(  i = k,  i+1 = l, x 1 …x N ) = Q/P(x) where Q = P(x 1 …x i,  i = k,  i+1 = l, x i+1 …x N ) = = P(  i+1 = l, x i+1 …x N |  i = k) P(x 1 …x i,  i = k) = = P(  i+1 = l, x i+1 x i+2 …x N |  i = k) f k (i) = = P(x i+2 …x N |  i+1 = l) P(x i+1 |  i+1 = l) P(  i+1 = l |  i = k) f k (i) = = b l (i+1) e l (x i+1 ) a kl f k (i) f k (i) a kl e l (x i+1 ) b l (i+1) So: P(  i = k,  i+1 = l | x,  ) = –––––––––––––––––– P(x |  )

7 Estimating new parameters So, f k (i) a kl e l (x i+1 ) b l (i+1) A kl =  i P(  i = k,  i+1 = l | x,  ) =  i ––––––––––––––––– P(x |  ) Similarly, E k (b) = [1/P(x)]  {i | xi = b} f k (i) b k (i) kl x i+1 a kl e l (x i ) b l (i+1)f k (i) x 1 ………x i-1 x i+2 ………x N xixi

8 The Baum-Welch Algorithm Initialization: Pick the best-guess for model parameters (or arbitrary) Iteration: 1.Forward 2.Backward 3.Calculate A kl, E k (b) 4.Calculate new model parameters a kl, e k (b) 5.Calculate new log-likelihood P(x |  ) GUARANTEED TO BE HIGHER BY EXPECTATION-MAXIMIZATION Until P(x |  ) does not change much

9 The Baum-Welch Algorithm Time Complexity: # iterations  O(K 2 N) Guaranteed to increase the log likelihood of the model P(  | x) = P(x,  ) / P(x) = P(x |  ) / ( P(x) P(  ) ) Not guaranteed to find globally best parameters Converges to local optimum, depending on initial conditions Too many parameters / too large model:Overtraining

10 Alternative: Viterbi Training Initialization:Same Iteration: 1.Perform Viterbi, to find  * 2.Calculate A kl, E k (b) according to  * + pseudocounts 3.Calculate the new parameters a kl, e k (b) Until convergence Notes:  Convergence is guaranteed – Why?  Does not maximize P(x |  )  In general, worse performance than Baum-Welch

11 Variants of HMMs

12 Higher-order HMMs The Genetic Code 3 nucleotides make 1 amino acid Statistical dependencies in triplets Question: Recognize protein-coding segments with a HMM

13 One way to model protein regions P(x i x i+1 x i+2 | x i-1 x i x i+1 ) Every state of the HMM emits 3 nucleotides Transition probabilities: Probability of one triplet, given previous triplet P(  i, |  i-1 ) Emission probabilities: P(x i x i-1 x i-2 |  i ) = 1/0 P(x i-1 x i-2 x i-3 |  i-1 ) = 1/0 AAA AAC AAT TTT … …

14 A more elegant way Every state of the HMM emits 1 nucleotide Transition probabilities: Probability of one triplet, given previous 3 triplets P(  i, |  i-1,  i-2,  i-3 ) Emission probabilities: P(x i |  i ) Algorithms extend with small modifications A C G T

15 Modeling the Duration of States Length distribution of region X: E[l X ] = 1/(1-p) Geometric distribution, with mean 1/(1-p) This is a significant disadvantage of HMMs Several solutions exist for modeling different length distributions XY 1-p 1-q pq

16 Sol’n 1: Chain several states XY 1-p 1-q p q X X Disadvantage: Still very inflexible l X = C + geometric with mean 1/(1-p)

17 Sol’n 2: Negative binomial distribution Duration in X: m turns, where  During first m – 1 turns, exactly n – 1 arrows to next state are followed  During m th turn, an arrow to next state is followed m – 1 P(l X = m) = n – 1 (1 – p) n-1+1 p (m-1)-(n-1) = n – 1 (1 – p) n p m-n X p X X p 1 – p p …… Y 1 – p

18 Example: genes in prokaryotes EasyGene: Prokaryotic gene-finder Larsen TS, Krogh A Negative binomial with n = 3

19 Solution 3:Duration modeling Upon entering a state: 1.Choose duration d, according to probability distribution 2.Generate d letters according to emission probs 3.Take a transition to next state according to transition probs Disadvantage: Increase in complexity: Time: O(D 2 ) Space: O(D) where D = maximum duration of state X

20 Connection Between Alignment and HMMs

21 A state model for alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACC IMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIII M (+1,+1) I (+1, 0) J (0, +1) Alignments correspond 1-to-1 with sequences of states M, I, J

22 Let’s score the transitions -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACC IMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIII M (+1,+1) I (+1, 0) J (0, +1) Alignments correspond 1-to-1 with sequences of states M, I, J s(x i, y j ) -d -e

23 How do we find optimal alignment according to this model? Dynamic Programming: M(i, j):Optimal alignment of x 1 …x i to y 1 …y j ending in M I(i, j): Optimal alignment of x 1 …x i to y 1 …y j ending in I J(i, j): Optimal alignment of x 1 …x i to y 1 …y j ending in J The score is additive, therefore we can apply DP recurrence formulas

24 Needleman Wunsch with affine gaps – state version Initialization: M(0,0) = 0; M(i,0) = M(0,j) = - , for i, j > 0 I(i,0) = d + i  e;J(0,j) = d + j  e Iteration: M(i – 1, j – 1) M(i, j) = s(x i, y j ) + max I(i – 1, j – 1) J(i – 1, j – 1) e + I(i – 1, j) I(i, j) = maxe + J(i, j – 1) d + M(i – 1, j – 1) e + I(i – 1, j) J(i, j) = maxe + J(i, j – 1) d + M(i – 1, j – 1) Termination: Optimal alignment given by max { M(m, n), I(m, n), J(m, n) }

25 Probabilistic interpretation of an alignment An alignment is a hypothesis that the two sequences are related by evolution Goal: Produce the most likely alignment Assert the likelihood that the sequences are indeed related


Download ppt "Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where."

Similar presentations


Ads by Google