Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Similar presentations


Presentation on theme: "CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2."— Presentation transcript:

1 CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2

2 CS262 Lecture 5, Win07, Batzoglou Example: The Dishonest Casino A casino has two dice: Fair die P(1) = P(2) = P(3) = P(5) = P(6) = 1/6 Loaded die P(1) = P(2) = P(3) = P(5) = 1/10 P(6) = 1/2 Casino player switches back-&-forth between fair and loaded die once every 20 turns Game: 1.You bet $1 2.You roll (always with a fair die) 3.Casino player rolls (maybe with fair die, maybe with loaded die) 4.Highest number wins $2

3 CS262 Lecture 5, Win07, Batzoglou The dishonest casino model FAIRLOADED 0.05 0.95 P(1|F) = 1/6 P(2|F) = 1/6 P(3|F) = 1/6 P(4|F) = 1/6 P(5|F) = 1/6 P(6|F) = 1/6 P(1|L) = 1/10 P(2|L) = 1/10 P(3|L) = 1/10 P(4|L) = 1/10 P(5|L) = 1/10 P(6|L) = 1/2

4 CS262 Lecture 5, Win07, Batzoglou A HMM is memory-less At each time step t, the only thing that affects future states is the current state  t K 1 … 2

5 CS262 Lecture 5, Win07, Batzoglou Definition of a hidden Markov model Definition: A hidden Markov model (HMM) Alphabet  = { b 1, b 2, …, b M } Set of states Q = { 1,..., K } Transition probabilities between any two states a ij = transition prob from state i to state j a i1 + … + a iK = 1, for all states i = 1…K Start probabilities a 0i a 01 + … + a 0K = 1 Emission probabilities within each state e i (b) = P( x i = b |  i = k) e i (b 1 ) + … + e i (b M ) = 1, for all states i = 1…K K 1 … 2 End Probabilities a i0 In Durbin; not needed

6 CS262 Lecture 5, Win07, Batzoglou A HMM is memory-less At each time step t, the only thing that affects future states is the current state  t P(  t+1 = k | “whatever happened so far”) = P(  t+1 = k |  1,  2, …,  t, x 1, x 2, …, x t )= P(  t+1 = k |  t ) K 1 … 2

7 CS262 Lecture 5, Win07, Batzoglou A HMM is memory-less At each time step t, the only thing that affects x t is the current state  t P(x t = b | “whatever happened so far”) = P(x t = b |  1,  2, …,  t, x 1, x 2, …, x t-1 )= P(x t = b |  t ) K 1 … 2

8 CS262 Lecture 5, Win07, Batzoglou A parse of a sequence Given a sequence x = x 1 ……x N, A parse of x is a sequence of states  =  1, ……,  N 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2

9 CS262 Lecture 5, Win07, Batzoglou Generating a sequence by the model Given a HMM, we can generate a sequence of length n as follows: 1.Start at state  1 according to prob a 0  1 2.Emit letter x 1 according to prob e  1 (x 1 ) 3.Go to state  2 according to prob a  1  2 4.… until emitting x n 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xnxn 2 1 K 2 0 e 2 (x 1 ) a 02

10 CS262 Lecture 5, Win07, Batzoglou Likelihood of a parse Given a sequence x = x 1 ……x N and a parse  =  1, ……,  N, To find how likely this scenario is: (given our HMM) P(x,  ) = P(x 1, …, x N,  1, ……,  N ) = P(x N |  N ) P(  N |  N-1 ) ……P(x 2 |  2 ) P(  2 |  1 ) P(x 1 |  1 ) P(  1 ) = a 0  1 a  1  2 ……a  N-1  N e  1 (x 1 )……e  N (x N ) 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2 A compact way to write a 0  1 a  1  2 ……a  N-1  N e  1 (x 1 )……e  N (x N ) Enumerate all parameters a ij and e i (b); n params Example: a 0Fair :  1 ; a 0Loaded :  2 ; … e Loaded (6) =  18 Then, count in x and  the # of times each parameter j = 1, …, n occurs F(j, x,  ) = # parameter  j occurs in (x,  ) (call F(.,.,.) the feature counts) Then, P(x,  ) =  j=1…n  j F(j, x,  ) = = exp [  j=1…n log(  j )  F(j, x,  ) ]

11 CS262 Lecture 5, Win07, Batzoglou Example: the dishonest casino Let the sequence of rolls be: x = 1, 2, 1, 5, 6, 2, 1, 5, 2, 4 Then, what is the likelihood of  = Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair? (say initial probs a 0Fair = ½, a oLoaded = ½) ½  P(1 | Fair) P(Fair | Fair) P(2 | Fair) P(Fair | Fair) … P(4 | Fair) = ½  (1/6) 10  (0.95) 9 =.00000000521158647211 ~= 0.5  10 -9

12 CS262 Lecture 5, Win07, Batzoglou Example: the dishonest casino So, the likelihood the die is fair in this run is just 0.521  10 -9 OK, but what is the likelihood of  = Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded? ½  P(1 | Loaded) P(Loaded, Loaded) … P(4 | Loaded) = ½  (1/10) 9  (1/2) 1 (0.95) 9 =.00000000015756235243 ~= 0.16  10 -9 Therefore, it somewhat more likely that all the rolls are done with the fair die, than that they are all done with the loaded die

13 CS262 Lecture 5, Win07, Batzoglou Example: the dishonest casino Let the sequence of rolls be: x = 1, 6, 6, 5, 6, 2, 6, 6, 3, 6 Now, what is the likelihood  = F, F, …, F? ½  (1/6) 10  (0.95) 9 ~= 0.5  10 -9, same as before What is the likelihood  = L, L, …, L? ½  (1/10) 4  (1/2) 6 (0.95) 9 =.00000049238235134735 ~= 0.5  10 -7 So, it is 100 times more likely the die is loaded

14 CS262 Lecture 5, Win07, Batzoglou Question # 1 – Evaluation GIVEN A sequence of rolls by the casino player 1245526462146146136136661664661636616366163616515615115146123562344 QUESTION How likely is this sequence, given our model of how the casino works? This is the EVALUATION problem in HMMs Prob = 1.3 x 10 -35

15 CS262 Lecture 5, Win07, Batzoglou Question # 2 – Decoding GIVEN A sequence of rolls by the casino player 1245526462146146136136661664661636616366163616515615115146123562344 QUESTION What portion of the sequence was generated with the fair die, and what portion with the loaded die? This is the DECODING question in HMMs FAIRLOADEDFAIR

16 CS262 Lecture 5, Win07, Batzoglou Question # 3 – Learning GIVEN A sequence of rolls by the casino player 1245526462146146136136661664661636616366163616515615115146123562344 QUESTION How “loaded” is the loaded die? How “fair” is the fair die? How often does the casino player change from fair to loaded, and back? This is the LEARNING question in HMMs Prob(6) = 64%

17 CS262 Lecture 5, Win07, Batzoglou The three main questions on HMMs 1.Evaluation GIVEN a HMM M, and a sequence x, FIND Prob[ x | M ] 2.Decoding GIVENa HMM M, and a sequence x, FINDthe sequence  of states that maximizes P[ x,  | M ] 3.Learning GIVENa HMM M, with unspecified transition/emission probs., and a sequence x, FINDparameters  = (e i (.), a ij ) that maximize P[ x |  ]

18 CS262 Lecture 5, Win07, Batzoglou Let’s not be confused by notation P[ x | M ]: The probability that sequence x was generated by the model The model is: architecture (#states, etc) + parameters  = a ij, e i (.) So, P[x | M] is the same with P[ x |  ], and P[ x ], when the architecture, and the parameters, respectively, are implied Similarly, P[ x,  | M ], P[ x,  |  ] and P[ x,  ] are the same when the architecture, and the parameters, are implied In the LEARNING problem we always write P[ x |  ] to emphasize that we are seeking the  * that maximizes P[ x |  ]

19 CS262 Lecture 5, Win07, Batzoglou Problem 1: Decoding Find the most likely parse of a sequence

20 CS262 Lecture 5, Win07, Batzoglou Decoding GIVEN x = x 1 x 2 ……x N Find  =  1, ……,  N, to maximize P[ x,  ]  * = argmax  P[ x,  ] Maximizes a 0  1 e  1 (x 1 ) a  1  2 ……a  N-1  N e  N (x N ) Dynamic Programming! V k (i) = max {  1…  i-1} P[x 1 …x i-1,  1, …,  i-1, x i,  i = k] = Prob. of most likely sequence of states ending at state  i = k 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2 Given that we end up in state k at step i, maximize product to the left and right

21 CS262 Lecture 5, Win07, Batzoglou Decoding – main idea Inductive assumption: Given that for all states k, and for a fixed position i, V k (i) = max {  1…  i-1} P[x 1 …x i-1,  1, …,  i-1, x i,  i = k] What is V l (i+1)? From definition, V l (i+1) = max {  1…  i} P[ x 1 …x i,  1, …,  i, x i+1,  i+1 = l ] = max {  1…  i} P(x i+1,  i+1 = l | x 1 …x i,  1,…,  i ) P[x 1 …x i,  1,…,  i ] = max {  1…  i} P(x i+1,  i+1 = l |  i ) P[x 1 …x i-1,  1, …,  i-1, x i,  i ] = max k [ P(x i+1,  i+1 = l |  i =k) max {  1…  i-1} P[x 1 …x i-1,  1,…,  i-1, x i,  i =k] ] = max k [ P(x i+1 |  i+1 = l ) P(  i+1 = l |  i =k) V k (i) ] = e l (x i+1 ) max k a kl V k (i)

22 CS262 Lecture 5, Win07, Batzoglou The Viterbi Algorithm Input: x = x 1 ……x N Initialization: V 0 (0) = 1(0 is the imaginary first position) V k (0) = 0, for all k > 0 Iteration: V j (i) = e j (x i )  max k a kj V k (i – 1) Ptr j (i) = argmax k a kj V k (i – 1) Termination: P(x,  *) = max k V k (N) Traceback:  N * = argmax k V k (N)  i-1 * = Ptr  i (i)

23 CS262 Lecture 5, Win07, Batzoglou The Viterbi Algorithm Similar to “aligning” a set of states to a sequence Time: O(K 2 N) Space: O(KN) x 1 x 2 x 3 ………………………………………..x N State 1 2 K V j (i)

24 CS262 Lecture 5, Win07, Batzoglou Viterbi Algorithm – a practical detail Underflows are a significant problem P[ x 1,…., x i,  1, …,  i ] = a 0  1 a  1  2 ……a  i e  1 (x 1 )……e  i (x i ) These numbers become extremely small – underflow Solution: Take the logs of all values V l (i) = log e k (x i ) + max k [ V k (i-1) + log a kl ]

25 CS262 Lecture 5, Win07, Batzoglou Example Let x be a long sequence with a portion of ~ 1/6 6’s, followed by a portion of ~ ½ 6’s… x = 123456123456…12345 6626364656…1626364656 Then, it is not hard to show that optimal parse is (exercise): FFF…………………...F LLL………………………...L 6 characters “123456” parsed as F, contribute.95 6  (1/6) 6 = 1.6  10 -5 parsed as L, contribute.95 6  (1/2) 1  (1/10) 5 = 0.4  10 -5 “162636” parsed as F, contribute.95 6  (1/6) 6 = 1.6  10 -5 parsed as L, contribute.95 6  (1/2) 3  (1/10) 3 = 9.0  10 -5

26 CS262 Lecture 5, Win07, Batzoglou Problem 2: Evaluation Find the likelihood a sequence is generated by the model

27 CS262 Lecture 5, Win07, Batzoglou Generating a sequence by the model Given a HMM, we can generate a sequence of length n as follows: 1.Start at state  1 according to prob a 0  1 2.Emit letter x 1 according to prob e  1 (x 1 ) 3.Go to state  2 according to prob a  1  2 4.… until emitting x n 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xnxn 2 1 K 2 0 e 2 (x 1 ) a 02

28 CS262 Lecture 5, Win07, Batzoglou A couple of questions Given a sequence x, What is the probability that x was generated by the model? Given a position i, what is the most likely state that emitted x i ? Example: the dishonest casino Say x = 12341…23162616364616234112…21341 Most likely path:  = FF……F (too “unlikely” to transition F  L  F) However: marked letters more likely to be L than unmarked letters P(box: FFFFFFFFFFF) = (1/6) 11 * 0.95 12 = 2.76 -9 * 0.54 = 1.49 -9 P(box: LLLLLLLLLLL) = [ (1/2) 6 * (1/10) 5 ] * 0.95 10 * 0.05 2 = 1.56*10 -7 * 1.5 -3 = 0.23 -9 FF

29 CS262 Lecture 5, Win07, Batzoglou Evaluation We will develop algorithms that allow us to compute: P(x)Probability of x given the model P(x i …x j )Probability of a substring of x given the model P(  i = k | x)“Posterior” probability that the i th state is k, given x A more refined measure of which states x may be in

30 CS262 Lecture 5, Win07, Batzoglou The Forward Algorithm We want to calculate P(x) = probability of x, given the HMM Sum over all possible ways of generating x: P(x) =   P(x,  ) =   P(x |  ) P(  ) To avoid summing over an exponential number of paths , define f k (i) = P(x 1 …x i,  i = k) (the forward probability) “generate i first characters of x and end up in state k”

31 CS262 Lecture 5, Win07, Batzoglou The Forward Algorithm – derivation Define the forward probability: f k (i) = P(x 1 …x i,  i = k) =   1…  i-1 P(x 1 …x i-1,  1,…,  i-1,  i = k) e k (x i ) =  l   1…  i-2 P(x 1 …x i-1,  1,…,  i-2,  i-1 = l) a lk e k (x i ) =  l P(x 1 …x i-1,  i-1 = l) a lk e k (x i ) = e k (x i )  l f l (i – 1 ) a lk

32 CS262 Lecture 5, Win07, Batzoglou The Forward Algorithm We can compute f k (i) for all k, i, using dynamic programming! Initialization: f 0 (0) = 1 f k (0) = 0, for all k > 0 Iteration: f k (i) = e k (x i )  l f l (i – 1) a lk Termination: P(x) =  k f k (N)

33 CS262 Lecture 5, Win07, Batzoglou Relation between Forward and Viterbi VITERBI Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Iteration: V j (i) = e j (x i ) max k V k (i – 1) a kj Termination: P(x,  *) = max k V k (N) FORWARD Initialization: f 0 (0) = 1 f k (0) = 0, for all k > 0 Iteration: f l (i) = e l (x i )  k f k (i – 1) a kl Termination: P(x) =  k f k (N)

34 CS262 Lecture 5, Win07, Batzoglou Motivation for the Backward Algorithm We want to compute P(  i = k | x), the probability distribution on the i th position, given x We start by computing P(  i = k, x) = P(x 1 …x i,  i = k, x i+1 …x N ) = P(x 1 …x i,  i = k) P(x i+1 …x N | x 1 …x i,  i = k) = P(x 1 …x i,  i = k) P(x i+1 …x N |  i = k) Then, P(  i = k | x) = P(  i = k, x) / P(x) Forward, f k (i)Backward, b k (i)

35 CS262 Lecture 5, Win07, Batzoglou The Backward Algorithm – derivation Define the backward probability: b k (i) = P(x i+1 …x N |  i = k) “starting from i th state = k, generate rest of x” =   i+1…  N P(x i+1,x i+2, …, x N,  i+1, …,  N |  i = k) =  l   i+1…  N P(x i+1,x i+2, …, x N,  i+1 = l,  i+2, …,  N |  i = k) =  l e l (x i+1 ) a kl   i+1…  N P(x i+2, …, x N,  i+2, …,  N |  i+1 = l) =  l e l (x i+1 ) a kl b l (i+1)

36 CS262 Lecture 5, Win07, Batzoglou The Backward Algorithm We can compute b k (i) for all k, i, using dynamic programming Initialization: b k (N) = 1, for all k Iteration: b k (i) =  l e l (x i+1 ) a kl b l (i+1) Termination: P(x) =  l a 0l e l (x 1 ) b l (1)

37 CS262 Lecture 5, Win07, Batzoglou Computational Complexity What is the running time, and space required, for Forward, and Backward? Time: O(K 2 N) Space: O(KN) Useful implementation technique to avoid underflows Viterbi: sum of logs Forward/Backward: rescaling at each few positions by multiplying by a constant

38 CS262 Lecture 5, Win07, Batzoglou Posterior Decoding We can now calculate f k (i) b k (i) P(  i = k | x) = ––––––– P(x) Then, we can ask What is the most likely state at position i of sequence x: Define  ^ by Posterior Decoding:  ^ i = argmax k P(  i = k | x) P(  i = k | x) = P(  i = k, x)/P(x) = P(x 1, …, x i,  i = k, x i+1, … x n ) / P(x) = P(x 1, …, x i,  i = k) P(x i+1, … x n |  i = k) / P(x) = f k (i) b k (i) / P(x)

39 CS262 Lecture 5, Win07, Batzoglou Posterior Decoding For each state,  Posterior Decoding gives us a curve of likelihood of state for each position  That is sometimes more informative than Viterbi path  * Posterior Decoding may give an invalid sequence of states (of prob 0)  Why?

40 CS262 Lecture 5, Win07, Batzoglou Posterior Decoding P(  i = k | x) =   P(  | x) 1(  i = k) =  {  :  [i] = k} P(  | x) x 1 x 2 x 3 …………………………………………… x N State 1 l P(  i =l|x) k 1(  ) = 1, if  is true 0, otherwise

41 CS262 Lecture 5, Win07, Batzoglou Viterbi, Forward, Backward VITERBI Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Iteration: V l (i) = e l (x i ) max k V k (i-1) a kl Termination: P(x,  *) = max k V k (N) FORWARD Initialization: f 0 (0) = 1 f k (0) = 0, for all k > 0 Iteration: f l (i) = e l (x i )  k f k (i-1) a kl Termination: P(x) =  k f k (N) BACKWARD Initialization: b k (N) = 1, for all k Iteration: b l (i) =  k e l (x i +1) a kl b k (i+1) Termination: P(x) =  k a 0k e k (x 1 ) b k (1)


Download ppt "CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2."

Similar presentations


Ads by Google