Hidden Markov Models Modified from:http://www.cs.iastate.edu/~cs544/Lectures/lectures.htmlhttp://www.cs.iastate.edu/~cs544/Lectures/lectures.html.

Slides:



Advertisements
Similar presentations
Hidden Markov Model in Biological Sequence Analysis – Part 2
Advertisements

Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
Lecture 2 Hidden Markov Model. Hidden Markov Model Motivation: We have a text partly written by Shakespeare and partly “written” by a monkey, we want.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
Hidden Markov Models. Room Wandering I’m going to wander around my house and tell you objects I see. Your task is to infer what room I’m in at every point.
Hidden Markov Model.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Hidden Markov Models.
Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.
Hidden Markov Models Eine Einführung.
Hidden Markov Models.
Markov Models Charles Yan Markov Chains A Markov process is a stochastic process (random process) in which the probability distribution of the.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Statistical NLP: Lecture 11
Profiles for Sequences
Hidden Markov Models Theory By Johan Walters (SR 2003)
JM - 1 Introduction to Bioinformatics: Lecture XIII Profile and Other Hidden Markov Models Jarek Meller Jarek Meller Division.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models Usman Roshan BNFO 601.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
… Hidden Markov Models Markov assumption: Transition model:
Biochemistry and Molecular Genetics Computational Bioscience Program Consortium for Comparative Genomics University of Colorado School of Medicine
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
Hidden Markov Models Lecture 6, Thursday April 17, 2003.
Hidden Markov Models Sasha Tkachev and Ed Anderson Presenter: Sasha Tkachev.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
CpG islands in DNA sequences
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
Dishonest Casino Let’s take a look at a casino that uses a fair die most of the time, but occasionally changes it to a loaded die. This model is hidden.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
HMM Hidden Markov Model Hidden Markov Model. CpG islands CpG islands In human genome, CG dinucleotides are relatively rare In human genome, CG dinucleotides.
Hidden Markov Models for Sequence Analysis 4
BINF6201/8201 Hidden Markov Models for Sequence Analysis
H IDDEN M ARKOV M ODELS. O VERVIEW Markov models Hidden Markov models(HMM) Issues Regarding HMM Algorithmic approach to Issues of HMM.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
CZ5226: Advanced Bioinformatics Lecture 6: HHM Method for generating motifs Prof. Chen Yu Zong Tel:
1 DNA Analysis Part II Amir Golnabi ENGS 112 Spring 2008.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models – Concepts 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models BMI/CS 576
CSCI 5822 Probabilistic Models of Human and Machine Learning
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (I)
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (I)
Presentation transcript:

Hidden Markov Models Modified from:

Nucleotide frequencies in the human genome ACTG

CpG Islands CpG dinucleotides are rarer than would be expected from the independent probabilities of C and G. –Reason: When CpG occurs, C is typically chemically modified by methylation and there is a relatively high chance of methyl-C mutating into T High CpG frequency may be biologically significant; e.g., may signal promoter region ( “ start ” of a gene). A CpG island is a region where CpG dinucleotides are much more abundant than elsewhere. Written CpG to distinguish from a C ≡ G base pair)

Hidden Markov Models Components: –Observed variables Emitted symbols –Hidden variables –Relationships between them Represented by a graph with transition probabilities Goal: Find the most likely explanation for the observed variables

The occasionally dishonest casino A casino uses a fair die most of the time, but occasionally switches to a loaded one –Fair die: Prob(1) = Prob(2) =... = Prob(6) = 1/6 –Loaded die: Prob(1) = Prob(2) =... = Prob(5) = 1/10, Prob(6) = ½ –These are the emission probabilities Transition probabilities –Prob(Fair  Loaded) = 0.01 –Prob(Loaded  Fair) = 0.2 –Transitions between states obey a Markov process

An HMM for the occasionally dishonest casino

The occasionally dishonest casino Known: –The structure of the model –The transition probabilities Hidden: What the casino did –FFFFFLLLLLLLFFFF... Observable: The series of die tosses – What we must infer: –When was a fair die used? –When was a loaded one used? The answer is a sequence FFFFFFFLLLLLLFFF...

Making the inference Model assigns a probability to each explanation of the observation: P(326|FFL) = P(3|F)·P(F  F)·P(2|F)·P(F  L)·P(6|L) = 1/6 · 0.99 · 1/6 · 0.01 · ½ Maximum Likelihood: Determine which explanation is most likely –Find the path most likely to have produced the observed sequence Total probability: Determine probability that observed sequence was produced by the HMM –Consider all paths that could have produced the observed sequence

Notation x is the sequence of symbols emitted by model –x i is the symbol emitted at time i A path, , is a sequence of states –The i-th state in  is  i a kr is the probability of making a transition from state k to state r: e k (b) is the probability that symbol b is emitted when in state k

A “ parse ” of a sequence 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xLxL 2 1 K 2 00

The occasionally dishonest casino

The most probable path The most likely path  * satisfies To find  *, consider all possible ways the last symbol of x could have been emitted Let Then

The Viterbi Algorithm Initialization(i = 0) Recursion (i = 1,..., L): For each state k Termination: To find  *, use trace-back, as in dynamic programming

Viterbi: Example 1  x  (1/6)  (1/2) = 1/12 0 (1/2)  (1/2) = 1/4 (1/6)  max{(1/12)  0.99, (1/4)  0.2} = (1/10)  max{(1/12)  0.01, (1/4)  0.8} = 0.02 B F L 00 (1/6)  max{  0.99, 0.02  0.2} = (1/2)  max{  0.01, 0.02  0.8} = 0.08

Viterbi gets it right more often than not

An HMM for CpG islands Emission probabilities are 0 or 1. E.g. e G- (G) = 1, e G- (T) = 0 See Durbin et al., Biological Sequence Analysis,. Cambridge 1998

Total probabilty Many different paths can result in observation x. The probability that our model will emit x is Total Probability If HMM models a family of objects, we want total probability to peak at members of the family. (Training)

Total probability Let Then   r rk r i kk aifxeif)1()()( Pr(x) can be computed in the same way as probability of most likely path. and

The Forward Algorithm Initialization (i = 0) Recursion (i = 1,..., L): For each state k Termination:

The Backward Algorithm Initialization (i = L) Recursion (i = L-1,..., 1): For each state k Termination:

Posterior Decoding How likely is it that my observation comes from a certain state? Like the Forward matrix, one can compute a Backward matrix Multiply Forward and Backward entries –P(x) is the total probability computed by, e.g., forward algorithm

Posterior Decoding With prob 0.01 for switching to the loaded die: With prob 0.05 for switching to the loaded die:

Estimating the probabilities ( “ training ” ) Baum-Welch algorithm –Start with initial guess at transition probabilities –Refine guess to improve the total probability of the training data in each step May get stuck at local optimum –Special case of expectation-maximization (EM) algorithm

Baum-Welch algorithm Estimated number of transitions s  t: Prob. s  t used at the position i (for one seq x) Estimated number of emissions x from s: New parameter:

Profile HMMs Model a family of sequences Derived from a multiple alignment of the family Transition and emission probabilities are position-specific Set parameters of model so that total probability peaks at members of family Sequences can be tested for membership in family using Viterbi algorithm to match against profile

Profile HMMs

Profile HMMs: Example Source: Note: These sequences could lead to other paths.

Pfam “ A comprehensive collection of protein domains and families, with a range of well- established uses including genome annotation. ” Each family is represented by two multiple sequence alignments and two profile-Hidden Markov Models (profile-HMMs). A. Bateman et al. Nucleic Acids Research (2004) Database Issue 32:D138-D141A. Bateman et al. Nucleic Acids Research (2004) Database Issue 32:D138-D141

Lab 5 I1I1 I2I2 I3I3 I4I4 D1D1 D2D2 D3D3 M1M1 M2M2 M3M3

Some recurrences I1I1 I2I2 I3I3 I4I4 D1D1 D2D2 D3D3 M1M1 M2M2 M3M3

More recurrences I1I1 I2I2 I3I3 I4I4 D1D1 D2D2 D3D3 M1M1 M2M2 M3M3

 TAG  Begin10000 M1M M2M M3M3 00 I1I I2I2 00 I3I3 00 I4I4 00 D1D D2D D3D3 00 End00