Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.

Slides:



Advertisements
Similar presentations
Hidden Markov Model in Biological Sequence Analysis – Part 2
Advertisements

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
Hidden Markov Models Eine Einführung.
Hidden Markov Models.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Apaydin slides with a several modifications and additions by Christoph Eick.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
… Hidden Markov Models Markov assumption: Transition model:
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models Lecture 6, Thursday April 17, 2003.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
CpG islands in DNA sequences
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Hidden Markov Models.
. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
Probabilistic Sequence Alignment BMI 877 Colin Dewey February 25, 2014.
Hidden Markov Models for Sequence Analysis 4
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Hidden Markov Models BMI/CS 776 Mark Craven March 2002.
Class review Sushmita Roy BMI/CS 576 Dec 11 th, 2014.
Hidden Markov Models CBB 231 / COMPSCI 261 part 2.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
CZ5226: Advanced Bioinformatics Lecture 6: HHM Method for generating motifs Prof. Chen Yu Zong Tel:
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, March 14, 2013 Session 8: Sequence Labeling This work is licensed under.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Lecture 16, CS5671 Hidden Markov Models (“Carnivals with High Walls”) States (“Stalls”) Emission probabilities (“Odds”) Transitions (“Routes”) Sequences.
Hidden Markov Models BMI/CS 576
Hidden Markov Models - Training
Three classic HMM problems
Introduction to HMM (cont)
Presentation transcript:

Learning HMM parameters Sushmita Roy BMI/CS Oct 21 st, 2014

Recall the three questions in HMMs How likely is an HMM to have generated a given sequence? – Forward algorithm What is the most likely “path” for generating a sequence of observations – Viterbi algorithm How can we learn an HMM from a set of sequences? – Forward-backward or Baum-Welch (an EM algorithm)

Learning HMMs from data Parameter estimation If we knew the state sequence it would be easy to estimate the parameters But we need to work with hidden state sequences Use “expected” counts of state transitions

Reviewing the notation States with emissions will be numbered from 1 to K – 0 begin state, N end state observed character at position t Observed sequence Hidden state sequence or path Transition probabilities Emission probabilities: Probability of emitting symbol b from state k

Learning without hidden information Learning is simple if we know the correct path for each sequence in our training set Estimate parameters by counting the number of times each parameter is used across the training set 5 C A G T begin end

Learning without hidden information Transition probabilities Emission probabilities Number of transitions from state k to state l Number of times c is emitted from k

Learning with hidden information 5 C A G T 0 begin end ???? if we don’t know the correct path for each sequence in our training set, consider all possible paths for the sequence estimate parameters through a procedure that counts the expected number of times each parameter is used across the training set

The Baum-Welch algorithm Also known as Forward-backward algorithm An Expectation Maximization (EM) algorithm EM is a family of algorithms for learning probabilistic models in problems that involve hidden information Expectation: Estimate the “expected” number of times there are transitions and emissions (using current values of parameters) Maximization: Estimate parameters given expected counts Hidden variables are the state transitions and emission counts

Learning parameters: the Baum-Welch algorithm algorithm sketch: – initialize parameters of model – iterate until convergence calculate the expected number of times each transition or emission is used adjust the parameters to maximize the likelihood of these expected values

The expectation step We need to know the probability of the symbol at t being produced by state k, given the entire sequence x Given these we can compute our expected counts for state transitions, character emissions We also need to know the probability of symbol at t and (t+1) being produced by state k, and l respectively given sequence x

Computing First we compute the probability of the entire observed sequence with the t th symbol being generated by state k Then our quantity of interest is computed as Obtained from the forward algorithm

To compute We need the forward and backward algorithm Forward algorithm f k (t) Backward algorithm b k (t) Computing

Using the forward and backward variables, this is computed as

The backward algorithm the backward algorithm gives us, the probability of observing the rest of x, given that we’re in state k after t characters A 0.4 C 0.1 G 0.2 T 0.3 A 0.2 C 0.3 G 0.3 T 0.2 begin end A 0.1 C 0.4 G 0.4 T 0.1 C A G T A 0.4 C 0.1 G 0.1 T 0.4

A 0.1 C 0.4 G 0.4 T 0.1 Example of computing A 0.4 C 0.1 G 0.2 T 0.3 A 0.2 C 0.3 G 0.3 T 0.2 begin end C A G T A 0.4 C 0.1 G 0.1 T 0.4

Steps of the backward algorithm Initialization ( t=T ) Recursion ( t=T-1 to 1 ) Termination Note, the same quantity can be obtained from the forward algorithm as well

Computing This is the probability of symbols at t and t+1 emitted from states k and l given the entire sequence x

Putting it all together Assume we are given J training instances x 1,..,x j,.. x J Expectation step – Using current parameter values compute for each x j Apply the forward and backward algorithms Compute – expected number of transitions between all pairs of states – expected number of emissions for all states Maximization step – Using current expected counts Compute the transition and emission probabilities

The expectation step: emission count We need the expected number of times c is emitted by state k x j : j th training sequences sum over positions where c occurs in x

The expectation step: transition count Expected number of times of transitions from k to l

The maximization step Estimate new emission parameters by: Just like in the simple case but typically we’ll do some “smoothing” (e.g. add pseudocounts) Estimate new transition parameters by

The Baum-Welch algorithm initialize the parameters of the HMM iterate until convergence – initialize, with pseudocounts – E-step: for each training set sequence j = 1…n calculate values for sequence j add the contribution of sequence j to, – M-step: update the HMM parameters using,

Baum-Welch algorithm example Given – The HMM with the parameters initialized as shown – Two training sequences TAG, ACG A 0.1 C 0.4 G 0.4 T 0.1 A 0.4 C 0.1 G 0.1 T 0.4 beginend we’ll work through one iteration of Baum-Welch

Baum-Welch example (cont) Determining the forward values for TAG Here we compute just the values that are needed for computing successive values. For example, no point in calculating f 2 (1) In a similar way, we also compute forward values for ACG

Baum-Welch example (cont) Determining the backward values for TAG Again, here we compute just the values that are needed In a similar way, we also compute backward values for ACG

Baum-Welch example (cont) determining the expected emission counts for state 1 contribution of TAG contribution of ACG *note that the forward/backward values in these two columns differ; in each column they are computed for the sequence associated with the column

Baum-Welch example (cont) Determining the expected transition counts for state 1 (not using pseudocounts) In a similar way, we also determine the expected emission/transition counts for state 2 Contribution of TAGContribution of ACG

Baum-Welch example (cont) Maximization step: determining probabilities for state 1

Computational complexity of HMM algorithms Given an HMM with S states and a sequence of length L, the complexity of the Forward, Backward and Viterbi algorithms is – this assumes that the states are densely interconnected Given M sequences of length L, the complexity of Baum-Welch on each iteration is

Baum-Welch convergence Some convergence criteria – likelihood of the training sequences changes little – fixed number of iterations reached Usually converges in a small number of iterations Will converge to a local maximum (in the likelihood of the data given the model)

Summary Three problems in HMMs Probability of an observed sequence – Forward algorithm Most likely path for an observed sequence – Viterbi – Can be used for segmentation of observed sequence Parameter estimation – Baum-Welch – The backward algorithm is used to compute a quantity needed to estimate the posterior of a state given the entire observed sequence