Hidden Markov Models - Training

Slides:

Advertisements

Similar presentations

. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.

Advertisements

Hidden Markov Models (HMM) Rabiner’s Paper

. Inference and Parameter Estimation in HMM Lecture 11 Computational Genomics © Shlomo Moran, Ydo Wexler, Dan Geiger (Technion) modified by Benny Chor.

Hidden Markov Model in Biological Sequence Analysis – Part 2

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:

Learning HMM parameters

Hidden Markov Models Eine Einführung.

 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.

Statistical NLP: Lecture 11

Hidden Markov Models Theory By Johan Walters (SR 2003)

Hidden Markov Models Fundamentals and applications to bioinformatics.

Hidden Markov Models Usman Roshan BNFO 601.

HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the.

Lecture 6, Thursday April 17, 2003

Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.

Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.

. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.

Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.

Hidden Markov Models Lecture 6, Thursday April 17, 2003.

Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.

. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.

S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter

CpG islands in DNA sequences

Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.

Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.

Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.

Forward-backward algorithm LING 572 Fei Xia 02/23/06.

Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter

Hidden Markov Models.

Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.

Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.

1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:

. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.

Hidden Markov Models for Sequence Analysis 4

. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.

Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.

Hidden Markov Models Usman Roshan CS 675 Machine Learning.

Hidden Markov Models CBB 231 / COMPSCI 261 part 2.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.

PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,

1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.

CS Statistical Machine learning Lecture 24

1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.

Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.

. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.

1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.

1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.

1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.

Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.

Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

Lecture 16, CS5671 Hidden Markov Models (“Carnivals with High Walls”) States (“Stalls”) Emission probabilities (“Odds”) Transitions (“Routes”) Sequences.

Hidden Markov Models BMI/CS 576

Comp. Genomics Recitation 6 14/11/06 ML and EM.

Learning, Uncertainty, and Information: Learning Parameters

Hidden Markov Models.

Hidden Markov Models Part 2: Algorithms

Three classic HMM problems

Hidden Markov Model LR Rabiner

- Viterbi training - Baum-Welch algorithm - Maximum Likelihood

CONTEXT DEPENDENT CLASSIFICATION

LECTURE 15: REESTIMATION, EM AND MIXTURES

CSE 5290: Algorithms for Bioinformatics Fall 2009

Introduction to HMM (cont)

Hidden Markov Models By Manish Shrivastava.

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

Presentation transcript:

Hidden Markov Models - Training 2018/9/16

Parameter Estimation How to specify a HMM mode? Design the structure: states, transitions, etc. Assign the parameter values: the transition and emission probabilities akl and ek(b) 2018/9/16

Parameter Estimation Suppose we have a set of sequences that we want the HMM model to fit it well. (i.e., we want the HMM model to generate the sequences with high probabilities) These sequences are called training sequences. Let them be x1, x2, x3,… xn Assume they are independent logP(x1, x2, x3,… xn |Θ) = ∑i logP(xi |Θ) where Θ represents the entire current set of values of the parameters in the model The goal is to maximize ∑i logP(xi |Θ). 2018/9/16

Parameter Estimation When The Paths of States Are Known Count the number of times each particular transition or emissions is used in the set of training sequences. Akl and Ek(b). Maximum likelihood estimators for akl and ek(b) are given by 2018/9/16

Parameter Estimation When The Paths of States Are Known Problems: Vulnerable to overfitting if there are insufficient training data. For example, if there is a state k that is never used in the set of example sequences, then the estimation equations are undefined for that state. Solution: Add predetermined pseudocounts to the Akl and Ek(b) 2018/9/16

Parameter Estimation When The Paths of States Are Known Akl= number of transitions k to l in training data + rkl Ek(b) = number of emissions of symbol b from state k in the training data + rk(b) The pseudocounts should reflect our prior biases about the probability values Small priority values indicate weak prior knowledge, and larger values indicate more definite prior knowledge 2018/9/16

Parameter Estimation When The Paths of State Are Unknown When the paths are unknown for the training sequences, the previous method can not be applied. We will use iterative procedures to train: Baum-Welch Training Viterbi Training 2018/9/16

Iterative process Solution: Iterative process Assign a set of initial values to Akl and Ek(b) Repeat until some stopping criterion is reached. Find the most probable state path for each training sequence based on current Akl and Ek(b) Consider the most probable paths as the actual paths and use Formula 1 to derive new values of for Akl and Ek(b). 2018/9/16

Iterative process The overall log likelihood of the model is increased by the iteration, and hence that the process will converge to a local maximum. Unfortunately, there are usually many local maximums, and the starting values will strongly determine which local maximum the process will be stuck in. Thus, may need to try different starting points. 2018/9/16

Baum-Welch The Baum-Welch algorithm calculates Akl and Ek(b) as the expected times each transition or emission is used, given the training sequences. The probability that transition kl is used at position i in sequence x is : 2018/9/16

Baum-Welch From this we can derive the expected number of times that transition kl is used by summing over all positions and all training sequences is the forward variable, and is the backward variable Similarly, we can calculate the expected number of times that letter b appears in state k by: Where the inner sum is only for those positions i where symbol b is emitted 2018/9/16

Baum-Welch Having calculated these expectations, the new model parameters can be calculated using (1). Based on the new parameters, we can iteratively obtain newer values of As and Es as before. The process is converging in a continuous-values space, and so will never in fact reach a maximum. Stopping criterion is needed: 1) the change in total log likelihood is sufficiently small. 2) normalize the log likelihood by the number of sequences n and maybe also by the sequence lengths, then consider the change in the average log likelihood per residue. 2018/9/16

Baum-Welch Algorithm: Baum-Welch Initialization: Pick arbitrary model parameters Recurrence: Set all the A and E variables to their pseudocount values r (or to 0) For each Sequence j=1..n: Calculate for sequence j using the forward algorithm Calculate for sequence j using the backward algorithm Add the contribution of sequence j to A (using formula 3) and E (using formula 4) Calculate new model parameters (using formula 1) Calculate new log likelihood of the model Termination: Stop if the change in log likelihood is less than some predefined threshold or the maximum number of iterations is exceeded. 2018/9/16

Baum-Welch Baum-Welch algorithm is a special case of EM algorithm, a very powerful approach for probabilistic parameter estimation. 2018/9/16

Viterbi Training An alternative to Baum-Welch training. The most probable paths for the training sequences are derived using viterbi algorithm, and these are used in the re-estimation process. The process is also iterated when the new parameter values are obtained. It will converge precisely, because the assignment of paths is a discrete process. Stopping criteria: None of the paths change. At this point, the parameter estimates will not change either, since they are determined completely by the paths 2018/9/16

Viterbi Training Difference between Baum-Welch and Viterbi training: Total possibility vs best path Baum-Welch maximize logP(x1, x2, x3,… xn |Θ), while viterbi maximize the contribution to the likelihood logP(x1, x2, x3,… xn |Θ, p*(x1),…,p*(xn)) from the most probable paths for all the sequences. Thus, Viterbi training performs less well in general than Baum-Welch. But it is widely used when the primary purpose of HMM is to decode via Viterbi alignment. 2018/9/16