Download presentation

1
**Learning HMM parameters**

Sushmita Roy BMI/CS 576 Oct 31st, 2013

2
**Recall the three questions in HMMs**

Given a sequence of observations how likely is it an HMM to have generated it? Forward algorithm What is the most likely sequence of states that has generated a sequence of observations Viterbi How can we learn an HMM from a set of sequences? Forward-backward or Baum-Welch (an EM algorithm)

3
**Learning HMMs from data**

Parameter estimation If we knew the state sequence it would be easy to estimate the parameters But we need to work with hidden state sequences Use “expected” counts of state transitions

4
**Learning without hidden information**

Learning is simple if we know the correct path for each sequence in our training set 2 2 4 4 5 C A G T begin 1 3 end 5 2 4 Estimate parameters by counting the number of times each parameter is used across the training set

5
**Learning without hidden information**

Transition probabilities Emission probabilities Number of transitions from k to l k,l are states Number of times b is emitted from k

6
**Learning with hidden information**

if we don’t know the correct path for each sequence in our training set, consider all possible paths for the sequence ? ? ? ? 5 C A G T begin 1 3 end 5 2 4 estimate parameters through a procedure that counts the expected number of times each parameter is used across the training set

7
**The Baum-Welch algorithm**

Also known as Forward-backward algorithm An Expectation Maximization algorithm Expectation: Estimate the “expected” number of times there are transitions and emissions (using current values of parameters) Maximization: Estimate parameters given hidden variables Hidden variables are the state transitions and emission counts

8
The expectation step We need to know the probability of the i th symbol being produced by state k, given sequence x (posterior probability of state k at time t) We also need to know the probability of ith and (i+1)th symbol being produced by state k, and l given sequence x Given these we can compute our expected counts for state transitions, character emissions

9
**Computing We will do this in a somewhat indirect manner**

First we compute the probability of the entire observed sequence with the tth symbol being generated by state k Forward algorithm fk(t) Backward algorithm bk(t)

10
Computing If we can compute How can we get Forward step

11
**The backward algorithm**

the backward algorithm gives us , the probability of observing the rest of x, given that we’re in state k after i characters 0.4 0.2 A 0.4 C 0.1 G 0.2 T 0.3 A 0.2 C 0.3 G 0.3 T 0.2 0.8 0.6 0.5 begin 1 3 end 5 0.5 A 0.4 C 0.1 G 0.1 T 0.4 A 0.1 C 0.4 G 0.4 T 0.1 0.9 0.2 2 4 0.1 0.8 C A G T

12
**Steps of the backward algorithm**

Initialization (t=T) Recursion (t=T-1 to 1) Termination

13
Computing This is

14
**Putting it all together**

We need the expected number of times c is emitted by state k And the expected number of times k transitions to l Training sequences

15
**The maximization step Estimate new emission parameters by:**

Estimate new transition parameters by Just like in the simple case but typically we’ll do some “smoothing” (e.g. add pseudocounts)

16
**The Baum-Welch algorithm**

initialize the parameters of the HMM iterate until convergence initialize , with pseudocounts E-step: for each training set sequence j = 1…n calculate values for sequence j add the contribution of sequence j to , M-step: update the HMM parameters using ,

17
**Baum-Welch algorithm example**

given the HMM with the parameters initialized as shown the training sequences TAG, ACG A 0.1 C 0.4 G 0.4 T 0.1 A 0.4 C 0.1 G 0.1 T 0.4 begin end 1.0 0.1 0.9 0.2 0.8 3 2 1 we’ll work through one iteration of Baum-Welch

18
**Baum-Welch example (cont)**

Determining the forward values for TAG Here we compute just the values that are needed for computing successive values. For example, no point in calculating f1(3) In a similar way, we also compute forward values for ACG

19
**Baum-Welch example (cont)**

Determining the backward values for TAG Again, here we compute just the values that are needed In a similar way, we also compute backward values for ACG

20
**Baum-Welch example (cont)**

determining the expected emission counts for state 1 contribution of TAG contribution of ACG pseudocount *note that the forward/backward values in these two columns differ; in each column they are computed for the sequence associated with the column

21
**Baum-Welch example (cont)**

determining the expected transition counts for state 1 (not using pseudocounts) in a similar way, we also determine the expected emission/transition counts for state 2 Contribution of TAG Contribution of ACG

22
**Baum-Welch example (cont)**

determining probabilities for state 1

23
**Summary Three problems in HMMs Probability of an observed sequence**

Forward algorithm Most likely path for an observed sequence Viterbi Can be used for segmentation of observed sequence Parameter estimation Baum-Welch The backward algorithm is used to compute a quantity needed to estimate the posterior of a state given the entire observed sequence

Similar presentations

© 2021 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google