1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:

1 Markov Chains

2 Hidden Markov Models

3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution: Using a combined model

4 Hidden Markov Models The essential difference between a Markov chain and a hidden Markov model is that for a hidden Markov model there is not a one-to-one correspondence between the states and the symbols (Why Hidden?). It is no longer possible to tell what state the model was in when x i was generated just by looking at x i. In the previous example, there is no way to tell by looking at a single symbol C in isolation whether it was emitted by state C + or C -. Many states to one letter. Many letters to one state. We now have to distinguish the sequence of states from the sequences of symbols

5 Hidden Markov Models States A path of state      …  n Observable symbols A, C, G, T X=x 1,x 2,…,x n Transition probabilities a kl =P(  i = l |  i-1 =k) Emission probabilities e k (b)=P( x i =b|  i =k) Decouple states and observable symbols

6 Hidden Markov Models We can think of a HMM as a generative model, that generates or emits sequences. First state   is selected (either randomly or according to some prior probabilities), then symbol x 1 is emitted at state   with possibility e 1 (x 1 ). Then it transits to state   with possibility a 12, etc.

7 Hidden Markov Models X: G C A T A G C G G C T A G C T G A A T A G G A …  : G + C + A + T + A + G + C + G + G + C + T + A + G + C + T - G - A - A - T - A - G - G - A - … Now it is the path of hidden states that we want to find out Many paths can be used to generate X, we want to find out the most likely one. There are several ways to do this 1. Brute Force method 2. Dynamic programming We will talk about them later

8 The occasionally dishonest casino A casino uses a fair die most of the time, but occasionally switches to a loaded one Fair die: Prob(1) = Prob(2) =... = Prob(6) = 1/6 Loaded die: Prob(1) = Prob(2) =... = Prob(5) = 1/10, Prob(6) = ½ These are the emission probabilities at the two states, loaded and fair. Transition probabilities Prob(Fair  Loaded) = 0.01 Prob(Loaded  Fair) = 0.2 Transitions between states obey a Markov process

9 A HMM for the occasionally dishonest casino

10 The occasionally dishonest casino The casino won’t tell you when they use the fair or loaded die. Known: The structure of the model The transition probabilities Hidden: What the casino did FFFFFLLLLLLLFFFF... Observable: The series of die tosses 3415256664666153... What we must infer: When the fair die was used? When the loaded die was used? The answer is a sequence of states FFFFFFFLLLLLLFFF...

11 Making the inference Model assigns a probability to each explanation of the observation: P(326|FFL) = P(3|F)·P(F  F)·P(2|F)·P(F  L)·P(6|L) = 1/6 · 0.99 · 1/6 · 0.01 · ½

12 Notation x is the sequence of symbols emitted by model x i is the symbol emitted at time i A path, , is a sequence of states The i-th state in  is  i a kr is the probability of making a transition from state k to state r: e k (b) is the probability that symbol b is emitted when in state k

13 A path of a sequence 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xLxL 2 1 K 2 00

14 The occasionally dishonest casino

15 The most probable path The most likely path  * satisfies To find  *, consider all possible ways the last symbol of x could have been emitted Let Then

16 The Viterbi Algorithm Viterbi Algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states – called Veterbi Path – that results in a sequence of observed symbols Assumptions: 1. Both the observed symbols and hidden states must be in a sequence 2. These two sequences need to be aligned, and an observed symbol needs to correspond to exactly one hidden state 3. Computing the most likely sequence of hidden states (path) up to a certain point t must depend only on the observed symbol at point t, and the most likely sequence of hidden states (path) up to point t − 1 These assumptions are all satisfied in a first-order hidden Markov model.

17 The Viterbi Algorithm Initialization(i = 0) Recursion (i = 1,..., L): For each state k Termination: To find  *, use trace-back(i=L…1), as in dynamic programming

18 Viterbi: Example 1  x 0 0 6 26  (1/2)  (1/6) = 1/12 0 (1/2)  (1/2) = 1/4 (1/6)  max{(1/12)  0.99, (1/4)  0.2} = 0.01375 (1/10)  max{(1/12)  0.01, (1/4)  0.8} = 0.02 B F L 00 (1/6)  max{0.01375  0.99, 0.02  0.2} = 0.00226875 (1/2)  max{0.01375  0.01, 0.02  0.8} = 0.08

19 Viterbi gets it right more often than not

20 Hidden Markov Models

21 Total probability Many different paths can result in observation x. The probability that our model will emit x is Total Probability If HMM models a family of objects, we want total probability to peak at members of the family. (Training)

22 Total probability Let Then   r rk r i kk aifxeif)1()()( Pr(x) can be computed in the same way as probability of most likely path. and

23 The Forward Algorithm Initialization (i = 0) Recursion (i = 1,..., L): For each state k Termination:

24 Hidden Markov Models Decoding Viterbi: Maximum Likelihood: Determine which explanation is most likely Find the path most likely to have produced the observed sequence Forward: Total probability: Determine probability that observed sequence was produced by the HMM Consider all paths that could have produced the observed sequence Forward and Backward : the probability that x i came from state k given the observed sequence, i.e. P(  i =k|x)

25 The Backward Algorithm Let Then i=L-1,...,1 Pr(x) can be computed in the same way as probability of most likely path. and

26 The Backward Algorithm Initialization (i= L) b k (L)=a ko for all k Recursion (i = L-1,..., 1): For each state Termination:

27 Posterior state probabilities The probability that x i came from state k given the observed sequence, i.e. P(  i =k|x) P(x,  i =k)=P(x 1 …x i,  i =k) P(x i+1 …x L |x 1 …x i,  i =k) =P(x 1 …x i,  i =k) P(x i+1 …x L |  i =k) =f k (i) b k (i)  P(  i =k|x)=f k (i)b k (i)/P(x)  Posterior decoding: Assign x i the state k that maximize P(  i =k|x)=f k (i)b k (i)/P(x)

28 Estimating the probabilities

29 Estimating the probabilities (“training”) Baum-Welch algorithm Start with initial guess at transition probabilities Refine guess to improve the total probability of the training data in each step May get stuck at local optimum Special case of expectation-maximization (EM) algorithm Viterbi training Derive probable paths for training data using Viterbi algorithm Re-estimate transition probabilities based on Viterbi path Iterate until paths stop changing

1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:

Similar presentations

Presentation on theme: "1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:

Similar presentations

Presentation on theme: "1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:"— Presentation transcript:

Similar presentations

About project

Feedback