Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hidden Markov Models Lecture 5, Tuesday April 15, 2003.

Similar presentations


Presentation on theme: "Hidden Markov Models Lecture 5, Tuesday April 15, 2003."— Presentation transcript:

1 Hidden Markov Models Lecture 5, Tuesday April 15, 2003

2 Definition of a hidden Markov model Definition: A hidden Markov model (HMM) Alphabet  = { b 1, b 2, …, b M } Set of states Q = { 1,..., K } Transition probabilities between any two states a ij = transition prob from state i to state j a i1 + … + a iK = 1, for all states i = 1…K Start probabilities a 0i a 01 + … + a 0K = 1 Emission probabilities within each state e i (b) = P( x i = b |  i = k) e i (b 1 ) + … + e i (b M ) = 1, for all states i = 1…K K 1 … 2

3 The three main questions on HMMs 1.Evaluation GIVEN a HMM M, and a sequence x, FIND Prob[ x | M ] 2.Decoding GIVENa HMM M, and a sequence x, FINDthe sequence  of states that maximizes P[ x,  | M ] 3.Learning GIVENa HMM M, with unspecified transition/emission probs., and a sequence x, FINDparameters  = (e i (.), a ij ) that maximize P[ x |  ]

4 Today Decoding Evaluation

5 Problem 1: Decoding Find the best parse of a sequence

6 Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We can use dynamic programming! Let V k (i) = max {  1,…,i-1} P[x 1 …x i-1,  1, …,  i-1, x i,  i = k] = Probability of most likely sequence of states ending at state  i = k 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2

7 Decoding – main idea Given that for all states k, and for a fixed position i, V k (i) = max {  1,…,i-1} P[x 1 …x i-1,  1, …,  i-1, x i,  i = k] What is V k (i+1)? From definition, V l (i+1) = max {  1,…,i} P[ x 1 …x i,  1, …,  i, x i+1,  i+1 = l ] = max {  1,…,i} P(x i+1,  i+1 = l | x 1 …x i,  1,…,  i ) P[x 1 …x i,  1,…,  i ] = max {  1,…,i} P(x i+1,  i+1 = l |  i ) P[x 1 …x i-1,  1, …,  i-1, x i,  i ] = max k P(x i+1,  i+1 = l |  i = k) max {  1,…,i-1} P[x 1 …x i-1,  1,…,  i-1, x i,  i =k] = e l (x i+1 ) max k a kl V k (i)

8 The Viterbi Algorithm Input: x = x 1 ……x N Initialization: V 0 (0) = 1(0 is the imaginary first position) V k (0) = 0, for all k > 0 Iteration: V j (i) = e j (x i )  max k a kj V k (i-1) Ptr j (i) = argmax k a kj V k (i-1) Termination: P(x,  *) = max k V k (N) Traceback:  N * = argmax k V k (N)  i-1 * = Ptr  i (i)

9 The Viterbi Algorithm Similar to “aligning” a set of states to a sequence Time: O(K 2 N) Space: O(KN) x 1 x 2 x 3 ………………………………………..x N State 1 2 K V j (i)

10 Viterbi Algorithm – a practical detail Underflows are a significant problem P[ x 1,…., x i,  1, …,  i ] = a 0  1 a  1  2 ……a  i e  1 (x 1 )……e  i (x i ) These numbers become extremely small – underflow Solution: Take the logs of all values V l (i) = log e k (x i ) + max k [ V k (i-1) + log a kl ]

11 Example Let x be a sequence with a portion of ~ 1/6 6’s, followed by a portion of ~ ½ 6’s… x = 123456123456…12345 6626364656…1626364656 Then, it is not hard to show that optimal parse is (exercise): FFF…………………...F LLL………………………...L 6 nucleotides “123456” parsed as F, contribute.95 6  (1/6) 6 = 1.6  10 -5 parsed as L, contribute.95 6  (1/2) 1  (1/10) 5 = 0.4  10 -5 “162636” parsed as F, contribute.95 6  (1/6) 6 = 1.6  10 -5 parsed as L, contribute.95 6  (1/2) 3  (1/10) 3 = 9.0  10 -5

12 Problem 2: Evaluation Find the likelihood a sequence is generated by the model

13 Generating a sequence by the model Given a HMM, we can generate a sequence of length n as follows: 1.Start at state  1 according to prob a 0  1 2.Emit letter x 1 according to prob e  1 (x 1 ) 3.Go to state  2 according to prob a  1  2 4.… until emitting x n 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xnxn 2 1 K 2 0 e 2 (x 1 ) a 02

14 A couple of questions Given a sequence x, What is the probability that x was generated by the model? Given a position i, what is the most likely state that emitted x i ? Example: the dishonest casino Say x = 12341623162616364616234161221341 Most likely path:  = FF……F However: marked letters more likely to be L than unmarked letters

15 Evaluation We will develop algorithms that allow us to compute: P(x)Probability of x given the model P(x i …x j )Probability of a substring of x given the model P(  I = k | x)Probability that the i th state is k, given x A more refined measure of which states x may be in

16 The Forward Algorithm We want to calculate P(x) = probability of x, given the HMM Sum over all possible ways of generating x: P(x) =   P(x,  ) =   P(x |  ) P(  ) To avoid summing over an exponential number of paths , define f k (i) = P(x 1 …x i,  i = k) (the forward probability)

17 The Forward Algorithm – derivation Define the forward probability: f l (i) = P(x 1 …x i,  i = l) =   1…  i-1 P(x 1 …x i-1,  1,…,  i-1,  i = l) e l (x i ) =  k   1…  i-2 P(x 1 …x i-1,  1,…,  i-2,  i-1 = k) a kl e l (x i ) = e l (x i )  k f k (i-1) a kl

18 The Forward Algorithm We can compute f k (i) for all k, i, using dynamic programming! Initialization: f 0 (0) = 1 f k (0) = 0, for all k > 0 Iteration: f l (i) = e l (x i )  k f k (i-1) a kl Termination: P(x) =  k f k (N) a k0 Where, a k0 is the probability that the terminating state is k (usually = a 0k )

19 Relation between Forward and Viterbi VITERBI Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Iteration: V j (i) = e j (x i ) max k V k (i-1) a kj Termination: P(x,  *) = max k V k (N) FORWARD Initialization: f 0 (0) = 1 f k (0) = 0, for all k > 0 Iteration: f l (i) = e l (x i )  k f k (i-1) a kl Termination: P(x) =  k f k (N) a k0

20 Motivation for the Backward Algorithm We want to compute P(  i = k | x), the probability distribution on the i th position, given x We start by computing P(  i = k, x) = P(x 1 …x i,  i = k, x i+1 …x N ) = P(x 1 …x i,  i = k) P(x i+1 …x N | x 1 …x i,  i = k) = P(x 1 …x i,  i = k) P(x i+1 …x N |  i = k) Forward, f k (i)Backward, b k (i)

21 The Backward Algorithm – derivation Define the backward probability: b k (i) = P(x i+1 …x N |  i = k) =   i+1…  N P(x i+1,x i+2, …, x N,  i+1, …,  N |  i = k) =  l   i+1…  N P(x i+1,x i+2, …, x N,  i+1 = l,  i+2, …,  N |  i = k) =  l e l (x i+1 ) a kl   i+1…  N P(x i+2, …, x N,  i+2, …,  N |  i+1 = l) =  l e l (x i+1 ) a kl b l (i+1)

22 The Backward Algorithm We can compute b k (i) for all k, i, using dynamic programming Initialization: b k (N) = a k0, for all k Iteration: b k (i) =  l e l (x i+1 ) a kl b l (i+1) Termination: P(x) =  l a 0l e l (x 1 ) b l (1)

23 Computational Complexity What is the running time, and space required, for Forward, and Backward? Time: O(K 2 N) Space: O(KN) Useful implementation technique to avoid underflows Viterbi: sum of logs Forward/Backward: rescaling at each position by multiplying by a constant

24 Posterior Decoding We can now calculate f k (i) b k (i) P(  i = k | x) = ––––––– P(x) Then, we can ask What is the most likely state at position i of sequence x: Define  ^ by Posterior Decoding:  ^ i = argmax k P(  i = k | x)

25 Posterior Decoding For each state,  Posterior Decoding gives us a curve of likelihood of state for each position  That is sometimes more informative than Viterbi path  * Posterior Decoding may give an invalid sequence of states  Why?

26 Maximum Weight Trace Another approach is to find a sequence of states under some constraint, and maximizing expected accuracy of state assignments  A j (i) = max k such that Condition(k, j) A k (i-1) + P(  i = j | x ) We will revisit this notion again

27 A+C+G+T+ A-C-G-T- A modeling Example CpG islands in DNA sequences

28 Example: CpG Islands CpG nucleotides in the genome are frequently methylated (Write CpG not to confuse with CG base pair) C  methyl-C  T Methylation often suppressed around genes, promoters  CpG islands

29 Example: CpG Islands In CpG islands, CG is more frequent Other pairs (AA, AG, AT…) have different frequencies Question: Detect CpG islands computationally

30 A model of CpG Islands – (1) Architecture A+C+G+T+ A-C-G-T- CpG Island Not CpG Island


Download ppt "Hidden Markov Models Lecture 5, Tuesday April 15, 2003."

Similar presentations


Ads by Google