Presentation is loading. Please wait.

Presentation is loading. Please wait.

. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.

Similar presentations


Presentation on theme: ". Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that."— Presentation transcript:

1 . Class 5: Hidden Markov Models

2 Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that positions are independent l This means that the order of elements in the sequence did not play a role u In this class we learn about probabilistic models of sequences

3 Probability of Sequences  Fix an alphabet   Let X 1,…,X n be a sequence of random variables over   We want to model P(X 1,…,X n )

4 Markov Chains Assumption:  X i+1 is independent of the past once we know X i This allows us to write:

5 Markov Chains (cont) Assumption:  P(X i+1 |X i ) is the same for all i Notation P(X i+1 =b |X i =a ) = A ab  By specifying the matrix A and initial probabilities, we define P(X 1,…,X n )  To avoid the special case of P(X 1 ), we can use a special start state, and denote P(X 1 = a) = A sa

6 Example: CpG islands  In human genome, CpG dinucleotides are relatively rare CpG pairs undergo a process called methylation that modifies the C nucleotide A methylated C can (with relatively high chance) mutate to a T  Promotor regions are CpG rich l These regions are not methylated, and thus mutate less often These are called CpG islands

7 CpG Islands u We construct Markov chain for CpG rich and poor regions u Using maximum likelihood estimates from 60K nucleotide, we get two models

8 Ratio Test for CpG islands  Given a sequence X 1,…,X n we compute the likelihood ratio

9 Empirical Evalation

10 Finding CpG islands Simple Minded approach:  Pick a window of size N ( N = 100, for example) u Compute log-ratio for the sequence in the window, and classify based on that Problems:  How do we select N ? u What do we do when the window intersects the boundary of a CpG island?

11 Alternative Approach u Build a model that include “+” states and “-” states u A state “remembers” last nucleotide and the type of region u A transition from a - state to a + describes a start of CpG island

12 Hidden Markov Models Two components:  A Markov chain of hidden states H 1,…,H n with L values P(H i+1 =k |H i =l ) = A kl  Observations X 1,…,X n Assumption: X i depends only on hidden state H i l P(X i =a |H i =k ) = B ka

13 Semantics

14 Example: Dishonest Casino

15 Computing Most Probable Sequence Given: x 1,…,x n Output: h* 1,…,h* n such that

16 Idea:  If we know the value of h i, then the most probable sequence on i+1,…,n does not depend on observations before time i  Let V i (l) be the probability of the best sequence h 1,…,h i such that h i = l

17 Dynamic Programming Rule u so

18 Viterbi Algorithm  Set V 0 (0) = 1, V 0 (l) = 0 for l > 0  for i = 1, …, n l for l = 1,…,L H set  Let h* n = argmax l V n (l)  for i = n-1,…,1 set h* i = P i+1 (h* i+1 )

19 Computing Probabilities Given: x 1,…,x n Output: P(x* 1,…,x* n ) How do we sum of exponential number of hidden sequences?

20 Forward Algorithm u Perform dynamic programming on sequences  Let f i (l) = P(x 1,…,x i,H i =l) u Recursion rule: u Conclusion

21 Backward Algorithm u Perform dynamic programming on sequences  Let b i (l) = P(x i+1,…,x n |H i =l) u Recursion rule: u Conclusion

22 Computing Posteriors  How do we compute P(H i | x 1,…,x n ) ?

23 Dishonest Casino (again) u Computing posterior probabilities for “fair” at each point in a long sequence:

24 Learning Given a sequence x 1,…,x n, h 1,…,h n  How do we learn A kl and B ka ?  We want to find parameters that maximize the likelihood P(x 1,…,x n, h 1,…,h n ) We simply count:  N kl - number of times h i =k & h i+1 =l  N ka - number of times h i =k & x i = a

25 Learning Given only sequence x 1,…,x n  How do we learn A kl and B ka ?  We want to find parameters that maximize the likelihood P(x 1,…,x n ) Problem:  Counts are inaccessible since we do not observe h i

26  If we have A kl and B ka we can compute

27 Expected Counts  We can compute expected number of times h i =k & h i+1 =l u Similarly

28 Expectation Maximization (EM)  Choose A kl and B ka E-step:  Compute expected counts E[N kl ], E[N ka ] M-Step: u Restimate: u Reiterate

29 EM - basic properties  P(x 1,…,x n: A kl, B ka )  P(x 1,…,x n: A’ kl, B’ ka ) l Likelihood grows in each iteration  If P(x 1,…,x n: A kl, B ka ) = P(x 1,…,x n: A’ kl, B’ ka ) then A kl, B ka is a stationary point of the likelihood l either a local maxima, minima, or saddle point

30 Complexity of E-step u Compute forward and backward messages Time & Space complexity: O(nL) u Accumulate expected counts Time complexity O(nL 2 ) Space complexity O(L 2 )

31 EM - problems Local Maxima: u Learning can get stuck in local maxima u Sensitive to initialization u Require some method for escaping such maxima Choosing L u We often do not know how many hidden values we should have or can learn


Download ppt ". Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that."

Similar presentations


Ads by Google