Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hidden Markov Models.

Similar presentations


Presentation on theme: "Hidden Markov Models."— Presentation transcript:

1 Hidden Markov Models

2 The three main questions on HMMs
Evaluation GIVEN a HMM M, and a sequence x, FIND Prob[ x | M ] Decoding GIVEN a HMM M, and a sequence x, FIND the sequence  of states that maximizes P[ x,  | M ] Learning GIVEN a HMM M, with unspecified transition/emission probs., and a sequence x, FIND parameters  = (ei(.), aij) that maximize P[ x |  ]

3 Decoding 1 GIVEN x = x1x2……xN 2 We want to find  = 1, ……, N,
K x1 x2 x3 xK GIVEN x = x1x2……xN We want to find  = 1, ……, N, such that P[ x,  ] is maximized * = argmax P[ x,  ] We can use dynamic programming! Let Vk(i) = max{1,…,i-1} P[x1…xi-1, 1, …, i-1, xi, i = k] = Probability of most likely sequence of states ending at state i = k

4 The Viterbi Algorithm x1 x2 x3 ………………………………………..xN State 1 2 Vj(i) K Similar to “aligning” a set of states to a sequence Time: O(K2N) Space: O(KN)

5 Evaluation We demonstrated algorithms that allow us to compute:
P(x) Probability of x given the model P(xi…xj) Probability of a substring of x given the model P(i = k | x) Probability that the ith state is k, given x A more refined measure of which states x may be in

6 Motivation for the Backward Algorithm
We want to compute P(i = k | x), the probability distribution on the ith position, given x We start by computing P(i = k, x) = P(x1…xi, i = k, xi+1…xN) = P(x1…xi, i = k) P(xi+1…xN | x1…xi, i = k) = P(x1…xi, i = k) P(xi+1…xN | i = k) Then, P(i = k | x) = P(i = k, x) / P(x) Forward, fk(i) Backward, bk(i)

7 The Backward Algorithm – derivation
Define the backward probability: bk(i) = P(xi+1…xN | i = k) = i+1…N P(xi+1,xi+2, …, xN, i+1, …, N | i = k) = l i+1…N P(xi+1,xi+2, …, xN, i+1 = l, i+2, …, N | i = k) = l el(xi+1) akl i+1…N P(xi+2, …, xN, i+2, …, N | i+1 = l) = l el(xi+1) akl bl(i+1)

8 The Backward Algorithm
We can compute bk(i) for all k, i, using dynamic programming Initialization: bk(N) = ak0, for all k Iteration: bk(i) = l el(xi+1) akl bl(i+1) Termination: P(x) = l a0l el(x1) bl(1)

9 Computational Complexity
What is the running time, and space required, for Forward, and Backward? Time: O(K2N) Space: O(KN) Useful implementation technique to avoid underflows Viterbi: sum of logs Forward/Backward: rescaling at each position by multiplying by a constant

10 Viterbi, Forward, Backward
Initialization: V0(0) = 1 Vk(0) = 0, for all k > 0 Iteration: Vl(i) = el(xi) maxk Vk(i-1) akl Termination: P(x, *) = maxk Vk(N) FORWARD Initialization: f0(0) = 1 fk(0) = 0, for all k > 0 Iteration: fl(i) = el(xi) k fk(i-1) akl Termination: P(x) = k fk(N) ak0 BACKWARD Initialization: bk(N) = ak0, for all k Iteration: bl(i) = k el(xi+1) akl bk(i+1) Termination: P(x) = k a0k ek(x1) bk(1)

11 Posterior Decoding We can now calculate fk(i) bk(i)
P(i = k | x) = ––––––– P(x) Then, we can ask What is the most likely state at position i of sequence x: Define ^ by Posterior Decoding: ^i = argmaxk P(i = k | x)

12 Posterior Decoding For each state,
Posterior Decoding gives us a curve of likelihood of state for each position That is sometimes more informative than Viterbi path * Posterior Decoding may give an invalid sequence of states Why?

13 Posterior Decoding =  {:[i] = k} P( | x)
x1 x2 x3 …………………………………………… xN State 1 l P(i=l|x) k P(i = k | x) =  P( | x) 1(i = k) =  {:[i] = k} P( | x) 1() = 1, if  is true 0, otherwise

14 Posterior Decoding x1 x2 x3 …………………………………………… xN State 1 l k
P(i=l|x) P(j=l’|x) k Example: How do we compute P(i = l, ji = l’ | x)? fl(i) bl(j) P(i = l, iI = l’ | x) = ––––––– P(x)

15 CpG islands in DNA sequences
A modeling Example A+ C+ G+ T+ A- C- G- T- CpG islands in DNA sequences

16 Example: CpG Islands CpG nucleotides in the genome are frequently methylated (Write CpG not to confuse with CG base pair) C  methyl-C  T Methylation often suppressed around genes, promoters  CpG islands

17 Example: CpG Islands In CpG islands,
CG is more frequent Other pairs (AA, AG, AT…) have different frequencies Question: Detect CpG islands computationally

18 A model of CpG Islands – (1) Architecture
Not CpG Island

19 A model of CpG Islands – (2) Transitions
How do we estimate the parameters of the model? Emission probabilities: 1/0 Transition probabilities within CpG islands Established from many known (experimentally verified) CpG islands (Training Set) Transition probabilities within other regions Established from many known non-CpG islands + A C G T .180 .274 .426 .120 .171 .368 .188 .161 .339 .375 .125 .079 .355 .384 .182 - A C G T .300 .205 .285 .210 .233 .298 .078 .302 .248 .246 .208 .177 .239 .292

20 Log Likehoods— Telling “Prediction” from “Random”
Another way to see effects of transitions: Log likelihoods L(u, v) = log[ P(uv | + ) / P(uv | -) ] Given a region x = x1…xN A quick-&-dirty way to decide whether entire x is CpG P(x is CpG) > P(x is not CpG)  i L(xi, xi+1) > 0 A C G T -0.740 +0.419 +0.580 -0.803 -0.913 +0.302 +1.812 -0.685 -0.624 +0.461 +0.331 -0.730 -1.169 +0.573 +0.393 -0.679

21 A model of CpG Islands – (2) Transitions
What about transitions between (+) and (-) states? They affect Avg. length of CpG island Avg. separation between two CpG islands 1-p Length distribution of region X: P[lX = 1] = 1-p P[lX = 2] = p(1-p) P[lX= k] = pk(1-p) E[lX] = 1/(1-p) Geometric distribution, with mean 1/(1-p) X Y p q 1-q

22 A model of CpG Islands – (2) Transitions
No reason to favor exiting/entering (+) and (-) regions at a particular nucleotide To determine transition probabilities between (+) and (-) states Estimate average length of a CpG island: lCPG = 1/(1-p)  p = 1 – 1/lCPG For each pair of (+) states k, l, let akl  p × akl For each (+) state k, (-) state l, let akl = (1-p)/4 (better: take frequency of l in the (-) regions into account) Do the same for (-) states A problem with this model: CpG islands don’t have exponential length distribution This is a defect of HMMs – compensated with ease of analysis & computation A+ C+ G+ T+ A- C- G- T- 1–p

23 Applications of the model
Given a DNA region x, The Viterbi algorithm predicts locations of CpG islands Given a nucleotide xi, (say xi = A) The Viterbi parse tells whether xi is in a CpG island in the most likely general scenario The Forward/Backward algorithms can calculate P(xi is in CpG island) = P(i = A+ | x) Posterior Decoding can assign locally optimal predictions of CpG islands ^i = argmaxk P(i = k | x)

24 What if a new genome comes?
We just sequenced the porcupine genome We know CpG islands play the same role in this genome However, we have no known CpG islands for porcupines We suspect the frequency and characteristics of CpG islands are quite different in porcupines How do we adjust the parameters in our model? LEARNING

25 Re-estimate the parameters of the model based on training data
Problem 3: Learning Re-estimate the parameters of the model based on training data

26 Two learning scenarios
Estimation when the “right answer” is known Examples: GIVEN: a genomic region x = x1…x1,000,000 where we have good (experimental) annotations of the CpG islands GIVEN: the casino player allows us to observe him one evening, as he changes dice and produces 10,000 rolls Estimation when the “right answer” is unknown GIVEN: the porcupine genome; we don’t know how frequent are the CpG islands there, neither do we know their composition GIVEN: 10,000 rolls of the casino player, but we don’t see when he changes dice QUESTION: Update the parameters  of the model to maximize P(x|)

27 1. When the right answer is known
Given x = x1…xN for which the true  = 1…N is known, Define: Akl = # times kl transition occurs in  Ek(b) = # times state k in  emits b in x We can show that the maximum likelihood parameters  (maximize P(x|)) are: Akl Ek(b) akl = ––––– ek(b) = ––––––– i Aki c Ek(c)

28 1. When the right answer is known
Intuition: When we know the underlying states, Best estimate is the average frequency of transitions & emissions that occur in the training data Drawback: Given little data, there may be overfitting: P(x|) is maximized, but  is unreasonable 0 probabilities – VERY BAD Example: Given 10 casino rolls, we observe x = 2, 1, 5, 6, 1, 2, 3, 6, 2, 3  = F, F, F, F, F, F, F, F, F, F Then: aFF = 1; aFL = 0 eF(1) = eF(3) = .2; eF(2) = .3; eF(4) = 0; eF(5) = eF(6) = .1

29 Pseudocounts Solution for small training sets: Add pseudocounts
Akl = # times kl transition occurs in  + rkl Ek(b) = # times state k in  emits b in x + rk(b) rkl, rk(b) are pseudocounts representing our prior belief Larger pseudocounts  Strong priof belief Small pseudocounts ( < 1): just to avoid 0 probabilities

30 Pseudocounts r0F = r0L = rF0 = rL0 = 1; Example: dishonest casino
We will observe player for one day, 600 rolls Reasonable pseudocounts: r0F = r0L = rF0 = rL0 = 1; rFL = rLF = rFF = rLL = 1; rF(1) = rF(2) = … = rF(6) = 20 (strong belief fair is fair) rL(1) = rL(2) = … = rL(6) = 5 (wait and see for loaded) Above #s pretty arbitrary – assigning priors is an art

31 2. When the right answer is unknown
We don’t know the true Akl, Ek(b) Idea: We estimate our “best guess” on what Akl, Ek(b) are We update the parameters of the model, based on our guess We repeat


Download ppt "Hidden Markov Models."

Similar presentations


Ads by Google