Hidden Markov Models Eine Einführung.

Slides:



Advertisements
Similar presentations
Markov models and applications
Advertisements

Hidden Markov Model in Biological Sequence Analysis – Part 2
Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
Hidden Markov Model.
Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Models Fundamentals and applications to bioinformatics.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Lecture 6, Thursday April 17, 2003
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models Lecture 6, Thursday April 17, 2003.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
CpG islands in DNA sequences
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Hidden Markov Models.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.
HMM Hidden Markov Model Hidden Markov Model. CpG islands CpG islands In human genome, CG dinucleotides are relatively rare In human genome, CG dinucleotides.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Hidden Markov Models BMI/CS 776 Mark Craven March 2002.
Hidden Markov Models CBB 231 / COMPSCI 261 part 2.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
CZ5226: Advanced Bioinformatics Lecture 6: HHM Method for generating motifs Prof. Chen Yu Zong Tel:
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
1 DNA Analysis Part II Amir Golnabi ENGS 112 Spring 2008.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Hidden Markov Models – Concepts 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models BMI/CS 576
Hidden Markov Models - Training
CSE 5290: Algorithms for Bioinformatics Fall 2009
Presentation transcript:

Hidden Markov Models Eine Einführung

Markov Chains A T C G E B

Markov Chains We want a model that generates sequences in which the probability of a symbol depends on the previous symbol only. Transition probabilities: Probability of a sequence: Note:

Markov Chains The key property of a Markov Chain is that the probability of each symbol xi depends only on the value of the preceeding symbol Modelling the beginning and end of sequences

Markov Chains Markov Chains can be used to discriminate between two options by calculating a likelihood ratio Example: CpG – Islands in human DANN Regions labeled as CpG – islands  + model Regions labeled as non-CpG – islands  - model Maximum Likelihood estimators for the transition probabilities for each model and analgously for the – model. Cst+ is the number of times letter t followed letter s in the labelled region

Markov Chains + A C G T - A C G T From 48 putative CpG – islands of a human DNA one estimates the following transition probabilities Note that the tables are asymmetric + A C G T 0.180 0.274 0.426 0.120 0.171 0.368 0.188 0.161 0.339 0.375 0.125 0.079 0.355 0.384 0.182 - A C G T 0.300 0.205 0.285 0.210 0.322 0.298 0.078 0.302 0.248 0.246 0.208 0.177 0.239 0.292

Markov Chains To use the model for discrimination one calculates the log-odds ratio Β (bits) A C G T -0.740 0.419 0.580 -0.803 -0.913 0.302 1.812 -0.685 -0.624 0.461 0.331 -0.730 -1.169 0.573 0.393 -0.679

Hidden Markov Models How can one find CpG – islands in a long chain of nucleotides? Merge both models into one model with small transition probabilities between the chains. Within each chain the transition probabilities should remain close to the original ones Relabeling of the states: The states A+, C+, G+, T+ emit the symbols A, C, G, T The relabeling is critical as there is no one to one correspondence between the states and the symbols. From looking at C in isolation one cannot tell whether it was emitted from C+ or C-

Hidden Markov Models Formal Definitions Distinguish the sequence of states from the sequence of symbols Call the state sequence the path π. It follows a simple Markov model with transition probabilities As the symbols b are decoupled from the states k new parameters are needed giving the probability that symbol b is seen when in state k These are known as emission probabilities

Hidden Markov Models The Viterbi Algorithm It is the most common decoding algorithm with HMMs It is a dynamic programming algorithm There may be many state sequences which give rise to any particular sequence of symbols But the corresponding probabilities are very different CpG – islands: (C+, G+, C+, G+) (C-, G-, C-, G-) (C+, G-, C+, G-) They all generate the symbol sequence CGCG but the first has the highest probability

Hidden Markov Models Search recursively for the most probable path Suppose the probability vk(i) of the most probable path ending in state k with observation i is known for all states k Then this probability can be calculated for state xi+1 by with initial condition

Hidden Markov Models Viterbi Algorithm Initialisation (i=0): Rekursion (i=1..L): Termination: Traceback (i=1….L):

CpG Islands and CGCG sequence Hidden Markov Models CpG Islands and CGCG sequence Vl(i) C G B 1 A+ C+ 0.13 0.12 G+ 0.034 0.0032 T+ A- C- 0.0026 G- 0.010 0.00021 T-

Hidden Markov Models The Forward Algorithm As many different paths π can give rise to the same sequence, the probability of a sequencey P(x) is Brute force enumeration is not practical as the number of paths rises exponentially with the length of the sequence A simple solution is to evaluate at the most probable path only.

Hidden Markov Models The full probability P(x) can be calculated in a recursive way with dynamic programming.This is called the forward algorithm. Calculate the probability fk(i) of the observed sequence up to and including xi under the constraint that πi = k The recursion equation is

Hidden Markov Model Forward Algorithm Initialization (i=0): Recursion (i=1…..L): Termination:

The Backward Algorithm Hidden Markov Model The Backward Algorithm What is the most probable state for an observation xi ? What is the probability P(πi = k | x) that observation xi came from state k given the observed sequence. This is the posterior probability of state k at time i when the emitted sequence is known. First calculate the probability of producing the entire observed sequence with the ith symbol being produced by state k:

The Backward Algorithm Hidden Markov Model The Backward Algorithm Initialisation (i=L): Recursion (i=L-1,…..,1): Termination:

Posterior Probabilities Hidden Markov Models Posterior Probabilities From the backward algorithm posterior probabilities can be obtained where P(x) is the result of the forward algorithm.

Parameter Estimation for HMMs Hidden Markov Model Parameter Estimation for HMMs Two problems remain: 1) how to choose an appropriate model architecture 2) how to assign the transition and emission probabilities Assumption: Independent training sequences x1 …. xn are given Consider the log likelihood where θ represents the set of values of all parameters (akl,el)

Estimation with known state sequence Hidden Markov Models Estimation with known state sequence Assume the paths are known for all training sequences Count the number Akl and Ek(b) of times each particular transition or emission is used in the set of training sequences plus pseudocounts rkl and rk(b), respectively. The Maximum Likelihood estimators for akl and ek(b) are then given by

Estimation with unknown paths Hidden Markov Models Estimation with unknown paths Iterative procedures must be used to estimate the parameters All standard algorithms for optimization of continuous functions can be used One particular iteration method is standardly used: the Baum – Welch algorithmus -- first estimate the Akl and Ek(b) by considering probable paths for the training sequences using the current values of the akl and ek(b) -- second use the maximum likelihood estimators to obtain new transition and emission parameters -- iterate that process until a stopping criterium is met -- many local maxima exist particularly with large HMMs

Baum – Welch Algorithmus Hidden Markov Models Baum – Welch Algorithmus It calculates the Akl and Ek(b) as the expected number of times each transition or emission is used in the training sequence It uses the values of the forward and backward algorithms The probability that akl is used at position i in sequence x is

Hidden Markov Models Baum – Welch Algorithm The expected number of times akl is used can be derived then by summing over all positions and over all training sequences The expected umber of times that letter b appears in state k is given by

Baum – Welch Algoritmus Hidden Markov Models Baum – Welch Algoritmus Initialisation: Pick arbitrary model parameters Recurrence: Set all A and E variables to their pseudocount values r or to zero For each sequence j=1……n: -- calculate fk(i) for sequence j using the forward algorithm -- calculate bk(i) for sequence j using the backward algorithm -- add the contribution of sequence j to A and E -- calculate the new model parameters maximum likelihood estimator -- calculate the new log likelihood of the model Termination: stop if log likelihood change is less than threshold

Hidden Markov Models Baum – Welch Algorithm The Baum – Welch algorithm is a special case of an Expectation – Maximization Algorithm As an alternative Viterbi training can be used as well. There the most probable paths are estimated with the Viterbi algorithm. These are used in the iterative re-estimation process. Convergence is garanteed as the assignment of the paths is a discrete process Unlike Baum – Welch this procedure does not maximise the true likelihood P(x1…..xn|θ) regarded as a function of the model parameters θ It finds the value of θ that maximizes the contribution to the likelihood P(x1…..xn|θ,π*(x1),….., π*(xn)) from the most probable paths for all sequences.