 # Learning HMM parameters

## Presentation on theme: "Learning HMM parameters"— Presentation transcript:

Learning HMM parameters
Sushmita Roy BMI/CS 576 Oct 31st, 2013

Recall the three questions in HMMs
Given a sequence of observations how likely is it an HMM to have generated it? Forward algorithm What is the most likely sequence of states that has generated a sequence of observations Viterbi How can we learn an HMM from a set of sequences? Forward-backward or Baum-Welch (an EM algorithm)

Learning HMMs from data
Parameter estimation If we knew the state sequence it would be easy to estimate the parameters But we need to work with hidden state sequences Use “expected” counts of state transitions

Learning without hidden information
Learning is simple if we know the correct path for each sequence in our training set 2 2 4 4 5 C A G T begin 1 3 end 5 2 4 Estimate parameters by counting the number of times each parameter is used across the training set

Learning without hidden information
Transition probabilities Emission probabilities Number of transitions from k to l k,l are states Number of times b is emitted from k

Learning with hidden information
if we don’t know the correct path for each sequence in our training set, consider all possible paths for the sequence ? ? ? ? 5 C A G T begin 1 3 end 5 2 4 estimate parameters through a procedure that counts the expected number of times each parameter is used across the training set

The Baum-Welch algorithm
Also known as Forward-backward algorithm An Expectation Maximization algorithm Expectation: Estimate the “expected” number of times there are transitions and emissions (using current values of parameters) Maximization: Estimate parameters given hidden variables Hidden variables are the state transitions and emission counts

The expectation step We need to know the probability of the i th symbol being produced by state k, given sequence x (posterior probability of state k at time t) We also need to know the probability of ith and (i+1)th symbol being produced by state k, and l given sequence x Given these we can compute our expected counts for state transitions, character emissions

Computing We will do this in a somewhat indirect manner
First we compute the probability of the entire observed sequence with the tth symbol being generated by state k Forward algorithm fk(t) Backward algorithm bk(t)

Computing If we can compute How can we get Forward step

The backward algorithm
the backward algorithm gives us , the probability of observing the rest of x, given that we’re in state k after i characters 0.4 0.2 A 0.4 C 0.1 G 0.2 T 0.3 A 0.2 C 0.3 G 0.3 T 0.2 0.8 0.6 0.5 begin 1 3 end 5 0.5 A 0.4 C 0.1 G 0.1 T 0.4 A 0.1 C 0.4 G 0.4 T 0.1 0.9 0.2 2 4 0.1 0.8 C A G T

Steps of the backward algorithm
Initialization (t=T) Recursion (t=T-1 to 1) Termination

Computing This is

Putting it all together
We need the expected number of times c is emitted by state k And the expected number of times k transitions to l Training sequences

The maximization step Estimate new emission parameters by:
Estimate new transition parameters by Just like in the simple case but typically we’ll do some “smoothing” (e.g. add pseudocounts)

The Baum-Welch algorithm
initialize the parameters of the HMM iterate until convergence initialize , with pseudocounts E-step: for each training set sequence j = 1…n calculate values for sequence j add the contribution of sequence j to , M-step: update the HMM parameters using ,

Baum-Welch algorithm example
given the HMM with the parameters initialized as shown the training sequences TAG, ACG A 0.1 C 0.4 G 0.4 T 0.1 A 0.4 C 0.1 G 0.1 T 0.4 begin end 1.0 0.1 0.9 0.2 0.8 3 2 1 we’ll work through one iteration of Baum-Welch

Baum-Welch example (cont)
Determining the forward values for TAG Here we compute just the values that are needed for computing successive values. For example, no point in calculating f1(3) In a similar way, we also compute forward values for ACG

Baum-Welch example (cont)
Determining the backward values for TAG Again, here we compute just the values that are needed In a similar way, we also compute backward values for ACG

Baum-Welch example (cont)
determining the expected emission counts for state 1 contribution of TAG contribution of ACG pseudocount *note that the forward/backward values in these two columns differ; in each column they are computed for the sequence associated with the column

Baum-Welch example (cont)
determining the expected transition counts for state 1 (not using pseudocounts) in a similar way, we also determine the expected emission/transition counts for state 2 Contribution of TAG Contribution of ACG

Baum-Welch example (cont)
determining probabilities for state 1

Summary Three problems in HMMs Probability of an observed sequence
Forward algorithm Most likely path for an observed sequence Viterbi Can be used for segmentation of observed sequence Parameter estimation Baum-Welch The backward algorithm is used to compute a quantity needed to estimate the posterior of a state given the entire observed sequence