. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.

Slides:



Advertisements
Similar presentations
. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.
Advertisements

Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
. Inference and Parameter Estimation in HMM Lecture 11 Computational Genomics © Shlomo Moran, Ydo Wexler, Dan Geiger (Technion) modified by Benny Chor.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.
. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Hidden Markov Models Eine Einführung.
. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Modified from:
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the.
… Hidden Markov Models Markov assumption: Transition model:
Midterm Review. The Midterm Everything we have talked about so far Stuff from HW I won’t ask you to do as complicated calculations as the HW Don’t need.
. Learning Hidden Markov Models Tutorial #7 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.
. Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.).
. Learning Bayesian networks Slides by Nir Friedman.
Lecture 5: Learning models using EM
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Basic Model For Genetic Linkage Analysis Lecture #3 Prepared by Dan Geiger.
. Hidden Markov Models For Genetic Linkage Analysis Lecture #4 Prepared by Dan Geiger.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
. Computational Genomics Lecture 8a Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
. Inference in HMM Tutorial #6 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
. Learning Bayesian networks Most Slides by Nir Friedman Some by Dan Geiger.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.
. Basic Model For Genetic Linkage Analysis Lecture #5 Prepared by Dan Geiger.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
What if a new genome comes? We just sequenced the porcupine genome We know CpG islands play the same role in this genome However, we have no known CpG.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Hidden Markov Models CBB 231 / COMPSCI 261 part 2.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
CS Statistical Machine learning Lecture 24
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
. EM in Hidden Markov Models Tutorial 7 © Ydo Wexler & Dan Geiger, revised by Sivan Yogev.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, March 14, 2013 Session 8: Sequence Labeling This work is licensed under.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Hidden Markov Models CISC 5800 Professor Daniel Leeds.
. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.
Hidden Markov Models BMI/CS 576
Hidden Markov Models - Training
Presentation transcript:

. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.

2 Nothing is hidden H1H1 H2H2 H L-1 HLHL HiHi Maximum likelihood: P(H 1 =t) = N t /(N t +N f ) H1H1 Maximum likelihood: P(H 2 =t|H 1 =t) = N t,t /(N t,t +N f,t ) And so on for every edge - independently. Equal-prior MAP: P(H 1 =t) = (a+N t )/(a+N t + a+N f ) H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi How to extend to hidden variables ?

3 Learning the parameters (EM algorithm) A common algorithm to learn the parameters from unlabeled sequences is called Expectation-Maximization (EM). In the HMM context it reads as follows: Start with some initial probability tables (many choices) Iterate until convergence M-step: use these Expected counts to update the local probability tables via the Maximum likelihood formula. E-step: Compute p(h i,h i-1 | x 1,…,x L ) using the current probability tables (“current parameters”).

4 Example I: Homogenous HMM, one sample Start with some probability tables (say  = = ½) Iterate until convergence E-step: Compute p , (h i |h i -1,x 1,…,x L ) from p , (h i, h i -1 | x 1,…,x L ) which is computed using the forward- backward algorithm as explained earlier. H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi M-step: Update the parameters simultaneously:    i p , (h i =1 | h i-1 =0, x 1,…,x L )/(L-1)   i p , (h i =0 | h i-1 =1, x 1,…,x L )/(L-1)

5 Example II: Homogenous HMM, N samples Start with some probability tables (say  = = ½) Iterate until convergence E-step: Compute p , (h i | h i -1, [x 1,…,x L ] j ) for j=1,..,N, from p , (h i, h i -1 | [x 1,…,x L ] j ) which is computed using the forward- backward algorithm as explained earlier. H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi Changes in the equations due to N>1 are marked in bold blue. M-step: Update the parameters simultaneously:    j  i p , (h i =1 | h i-1 =0, [x 1,…,x L ] j )/ N(L-1)   j  i p , (h i =0 | h i-1 =1, [x 1,…,x L ] j )/ N(L-1)

6 Example III: Non Homogenous, N samples Start with some probability tables (say  = = ½) Iterate until convergence E-step: Compute p  i, i (h i | h i -1, [x 1,…,x L ] j ) for j=1,...,N, from p  i, i (h i, h i -1 | [x 1,…,x L ] j ) which is computed using the forward- backward algorithm as explained earlier. M-step: Update the parameters simultaneously:  i   j p  i, i (h i =1 | h i-1 =0, [x 1,…,x L ] j )/N i   j p  i, i (h i =0 | h i-1 =1, [x 1,…,x L ] j )/N H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi Summation over i is now dropped; The factor L-1 dropped.

7 Example IV: missing emission probabilities H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi Exercise: Write equations for the parameters ,  ’ and . Hint: compute P(x i,h i |Data). Often the learned parameters are collectively denoted by . E.g., in the context of homogeneous HMMs, if all parameters are learned from data, then  ={ ,  ’, , , }.

8 Viterbi Training Start with some probability tables (many possible choices) Iterate until convergence E-step (new): Compute using the current parameters. M-step: use these Expected counts to update the local probability tables via Maximum likelihood (=N s1  s2 /N). Comments: Useful when the posterior probability centers around the MAP value. Avoids the inconsistency of adding up each link separately. E.g., one can not have H 1 =0, H 2 =1 and H 2 =0, H 3 =1, simultaneously, as we did earlier. Summing over all joint options is exponential. A common variant of the EM algorithm for HMMs.

9 Summary of HMM H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi 1.Belief update = posterior decoding Forward-Backward algorithm 2.Maximum A Posteriori assignment Viterbi algorithm 3.Learning parameters The EM algorithm Viterbi training

10 Some applications of HMMs H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi 1. Haplotyping 2. Gene mapping 3. Speech recognition, finance, 4. … you name it… everywhere

11 Haplotyping G1G1 G2G2 G L-1 GLGL H1H1 H2H2 H L-1 HLHL HiHi GiGi H1H1 H2H2 HLHL HiHi Every G i is an unordered pair of letters {aa,ab,bb}. The source of one letter is the first chain and the source of the other letter is the second chain. Which letter comes from which chain ? (Is it a paternal or maternal DNA?(

12 Model of Inheritance Example with two parents and one child. 1 One locus i i+1 More loci 3 children