Download presentation

Presentation is loading. Please wait.

Published byIrvin Wann Modified about 1 year ago

1
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 31 st, 2013

2
Recall the three questions in HMMs Given a sequence of observations how likely is it an HMM to have generated it? – Forward algorithm What is the most likely sequence of states that has generated a sequence of observations – Viterbi How can we learn an HMM from a set of sequences? – Forward-backward or Baum-Welch (an EM algorithm)

3
Learning HMMs from data Parameter estimation If we knew the state sequence it would be easy to estimate the parameters But we need to work with hidden state sequences Use “expected” counts of state transitions

4
Learning without hidden information Learning is simple if we know the correct path for each sequence in our training set Estimate parameters by counting the number of times each parameter is used across the training set 5 C A G T begin end

5
Learning without hidden information Transition probabilities Emission probabilities k,l are states Number of transitions from k to l Number of times b is emitted from k

6
Learning with hidden information 5 C A G T 0 begin end ???? if we don’t know the correct path for each sequence in our training set, consider all possible paths for the sequence estimate parameters through a procedure that counts the expected number of times each parameter is used across the training set

7
The Baum-Welch algorithm Also known as Forward-backward algorithm An Expectation Maximization algorithm – Expectation: Estimate the “expected” number of times there are transitions and emissions (using current values of parameters) – Maximization: Estimate parameters given hidden variables Hidden variables are the state transitions and emission counts

8
The expectation step We need to know the probability of the i th symbol being produced by state k, given sequence x (posterior probability of state k at time t) Given these we can compute our expected counts for state transitions, character emissions We also need to know the probability of i th and (i+1)th symbol being produced by state k, and l given sequence x

9
Computing We will do this in a somewhat indirect manner First we compute the probability of the entire observed sequence with the t th symbol being generated by state k Forward algorithm f k (t)Backward algorithm b k (t)

10
Computing If we can compute How can we get Forward step

11
The backward algorithm the backward algorithm gives us, the probability of observing the rest of x, given that we’re in state k after i characters A 0.4 C 0.1 G 0.2 T 0.3 A 0.2 C 0.3 G 0.3 T 0.2 begin end A 0.1 C 0.4 G 0.4 T 0.1 C A G T A 0.4 C 0.1 G 0.1 T 0.4

12
Steps of the backward algorithm Initialization ( t=T ) Recursion ( t=T-1 to 1 ) Termination

13
Computing This is

14
Putting it all together We need the expected number of times c is emitted by state k And the expected number of times k transitions to l Training sequences

15
The maximization step Estimate new emission parameters by: Just like in the simple case but typically we’ll do some “smoothing” (e.g. add pseudocounts) Estimate new transition parameters by

16
The Baum-Welch algorithm initialize the parameters of the HMM iterate until convergence – initialize, with pseudocounts – E-step: for each training set sequence j = 1…n calculate values for sequence j add the contribution of sequence j to, – M-step: update the HMM parameters using,

17
Baum-Welch algorithm example given – the HMM with the parameters initialized as shown – the training sequences TAG, ACG A 0.1 C 0.4 G 0.4 T 0.1 A 0.4 C 0.1 G 0.1 T 0.4 beginend we’ll work through one iteration of Baum-Welch

18
Baum-Welch example (cont) Determining the forward values for TAG Here we compute just the values that are needed for computing successive values. For example, no point in calculating f 1 (3) In a similar way, we also compute forward values for ACG

19
Baum-Welch example (cont) Determining the backward values for TAG Again, here we compute just the values that are needed In a similar way, we also compute backward values for ACG

20
Baum-Welch example (cont) determining the expected emission counts for state 1 contribution of TAG contribution of ACG pseudocount *note that the forward/backward values in these two columns differ; in each column they are computed for the sequence associated with the column

21
Baum-Welch example (cont) determining the expected transition counts for state 1 (not using pseudocounts) in a similar way, we also determine the expected emission/transition counts for state 2 Contribution of TAGContribution of ACG

22
Baum-Welch example (cont) determining probabilities for state 1

23
Summary Three problems in HMMs Probability of an observed sequence – Forward algorithm Most likely path for an observed sequence – Viterbi – Can be used for segmentation of observed sequence Parameter estimation – Baum-Welch – The backward algorithm is used to compute a quantity needed to estimate the posterior of a state given the entire observed sequence

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google