Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hidden Markov Models CBB 231 / COMPSCI 261 part 2.

Similar presentations


Presentation on theme: "Hidden Markov Models CBB 231 / COMPSCI 261 part 2."— Presentation transcript:

1 Hidden Markov Models CBB 231 / COMPSCI 261 part 2

2 CGATATTCGATTCTACGCGCGTATACTAGCTTATCTGATC CGATATTCGATTCTACGCGCGTATACTAGCTTATCTGATC 011111112222222111111222211111112222111110 011111112222222111111222211111112222111110 to state 012 from state 00 (0%)1 (100%)0 (0%) 11 (4%)21 (84%)3 (12%) 20 (0%)3 (20%)12 (80%) symbol ACGT in state 1 6 (24%) 7 (28%) 5 (20%) 7 (28%) 2 3 (20%) 2 (13%) 7 (47%) Recall: Training Unambiguous Models transitions emissions

3 Training Ambiguous Models The Problem: We have training sequences, but not the associated paths (state labels) Two Solutions: 1. Viterbi Training: Start with random HMM parameters. Use Viterbi to find the most probable path for each training sequence, and then label the sequence with that path. Use labeled sequence training on the resulting set of sequences and paths. 2. Baum-Welch Training: sum over all possible paths (rather than the single most probable one) to estimate expected counts A i, j and E i,k ; then use the same formulas as for labeled sequence training on these expected counts:

4 training features Viterbi Decoder Sequence Labeler Labeled Sequence Trainer paths {  i } initial submodel M new submodel M labeled features final submodel M * repeat n times Viterbi Training (iterate over the training sequences) (find most probable path for this sequence) (label the sequence with that path)

5 Recall: The Forward Algorithm Fik F(i,k) represents the probability P(x 0...x k - 1, q i ) that the machine emits the subsequence x 0...x k - 1 by any path ending in state q i — i.e., so that symbol x k - 1 is emitted by state q i.

6 The Backward Algorithm Bik B(i,k) = probability that the machine M will emit the subsequence x k...x L - 1 and then terminate, given that M is currently in state q i (which has already emitted x k - 1 ). THEREFORE:

7 Backward Algorithm: Pseudocode

8 (i,k-1) Baum-Welch: Summing over All Paths FikBikM F(i,k)B(i,k) = P(M emits x 0...x k - 1...x L - 1, with x k - 1 being emitted by state q i ). Fikq j q i x k q j Bjk+1M F(i,k)P t (q j |q i )P e (x k |q j )B( j,k+1) = P(M emits x 0...x k - 1 x k...x L - 1 and transitions from state q i to q j at time k - 1 → k).

9 Fik F(i,k) = P(x 0...x k - 1,q i ) = P(M emits x 0...x k - 1 by any path ending in state q i, with x k - 1 emitted by q i ). Bik B(i,k) = P(x k...x L - 1 |q i ) = P(M emits x k...x L - 1 and then terminates, given that M is in state q i, which has emitted x k - 1 ). FikBik F(i,k)B(i,k) = P(x 0...x k - 1,q i )P(x k...x L - 1 |q i )=P(x 0...x L - 1,q i ) * FikBik F(i,k)B(i,k)/P(S) = P(q i, k - 1 | S) where C(q i,k)=1 if q i =q and x i =s; otherwise 0. where C(q m,q n,k)=1 if q m =q i and q n =q j ; otherwise 0. Combining Forward & Backward * assuming P(x k...x L - 1 ) is conditionally independent of P(x 0...x k - 1 ), given q i.

10 Baum-Welch Training compute Fwd & Bkwd DP matrices accumulate expected counts for E & A

11 Using Logarithms in Forward & Backward In the log-space version of these algorithms, we can replace the raw probabilities p i with their logarithmic counterparts, log p i, and apply the Equation (6.43) whenever the probabilities are to be summed. Evaluation of the e log p i - log p 0 term should generally not result in numerical underflow in practice, since this term evaluates to p i /p 0, which for probabilities of similar events should not deviate too far from unity. (due to Kingsbury & Rayner, 1971) (6.43)

12 Monotonic Convergence Behavior

13 Forward algorithm Backward algorithm fixed path Posterior Decoding

14

15 Summary Training of ambiguous HMM’s can be accomplished using Viterbi training or the Baum-Welch algorithm Viterbi training performs labeling using the single most probable path Baum-Welch training instead estimates transition & emission events by computing expectations via Forward-Backward, by summing over all paths containing a given event Posterior decoding can be used to estimate the probability that a given symbol or substring was generate by a particular state


Download ppt "Hidden Markov Models CBB 231 / COMPSCI 261 part 2."

Similar presentations


Ads by Google