Hidden Markov Models: an Introduction by Rachel Karchin.

Slides:



Advertisements
Similar presentations
Hidden Markov Model in Biological Sequence Analysis – Part 2
Advertisements

Learning HMM parameters
Hidden Markov Model.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Hidden Markov Models.
Introduction to Hidden Markov Models
Hidden Markov Models Eine Einführung.
Hidden Markov Models.
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
Patterns, Profiles, and Multiple Alignment.
Hidden Markov Models Modified from:
Hidden Markov Models: Applications in Bioinformatics Gleb Haynatzki, Ph.D. Creighton University March 31, 2003.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Statistical NLP: Lecture 11
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Profiles for Sequences
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
JM - 1 Introduction to Bioinformatics: Lecture XIII Profile and Other Hidden Markov Models Jarek Meller Jarek Meller Division.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models in Bioinformatics Applications
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
Lecture 6, Thursday April 17, 2003
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT. 2 HMM Architecture Markov Chains What is a Hidden Markov Model(HMM)? Components of HMM Problems of HMMs.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Profile-profile alignment using hidden Markov models Wing Wong.
. Sequence Alignment via HMM Background Readings: chapters 3.4, 3.5, 4, in the Durbin et al.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
. Class 5: HMMs and Profile HMMs. Review of HMM u Hidden Markov Models l Probabilistic models of sequences u Consist of two parts: l Hidden states These.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Hidden Markov Models.
Doug Downey, adapted from Bryan Pardo,Northwestern University
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
HMM Hidden Markov Model Hidden Markov Model. CpG islands CpG islands In human genome, CG dinucleotides are relatively rare In human genome, CG dinucleotides.
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Hidden Markov Models for Sequence Analysis 4
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
H IDDEN M ARKOV M ODELS. O VERVIEW Markov models Hidden Markov models(HMM) Issues Regarding HMM Algorithmic approach to Issues of HMM.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Hidden Markov Models BMI/CS 776 Mark Craven March 2002.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
NLP. Introduction to NLP Very important for language processing Example in speech recognition: –“recognize speech” vs “wreck a nice beach” Example in.
1 MARKOV MODELS MARKOV MODELS Presentation by Jeff Rosenberg, Toru Sakamoto, Freeman Chen HIDDEN.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
Dongfang Xu School of Information
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Other Models for Time Series. The Hidden Markov Model (HMM)
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models BMI/CS 576
Presentation transcript:

Hidden Markov Models: an Introduction by Rachel Karchin

Outline Stochastic Modeling Discrete time series Simple Markov models Hidden Markov models Summary of key HMM algorithms Modeling protein families with linear profile HMMs

Outline Overfitting and regularization. To come.

BME100 9/28/01 References Lectures from David Haussler’s CMPS243 class (Winter 1998)

BME100 9/28/01 Stochastic Modeling Stochastic modeling. For phenomenon that exhibit random behavior. Random doesn’t mean arbitrary. Random events can be modeled by some probability distribution. Represented with random variables that take on values (numeric or symbolic) according to event outcomes.

BME100 9/28/01 General Discrete Time Series Chain of random variables X 1,X 2,X 3,..., X n Sequence of observed values x=x 1,x 2,x 3,..., x n When we observe x, say that: X 1 = x 1,X 2 = x 2,X 3 = x 3,..., X n = x n

BME100 9/28/01 Simple Markov model of order k Probability distribution for X t depends only on values of previous k random variables: X t-1,X t-2,..., X t-k

BME100 9/28/01 Simple Markov model of order k Example with k=1 and X t = {a,b} Observed sequence: x = abaaababbaa Model: PrevNextProb a0.5 b0.5 SStart probs aa0.7 ab0.3 ba0.5 bb0.5 P(x) = 0.5 * 0.3 * 0.5 * 0.7 * 0.3* 0.5* 0.3* 0.5 * 0.7

BME100 9/28/01 What is a hidden Markov model? Finite set of hidden states. At each time t, the system is in a hidden state, chosen at random depending on state chosen at time t-1. At each time t, observed letter is generated at random, depending only on current hidden state.

BME100 9/28/01 HMM for random toss of fair and biased coins P(H)=0. 5 P(T)=0. 5 Fair P(H)=0. 1 P(T)=0. 9 Biased Start 0.5 Sequence of states: q = FFFFBBBFFFFF Observed sequence: x = HTTHTTTTHHTH

BME100 9/28/01 HMM for random toss of fair and biased coins Sequence of states is a first -order Markov model but usually is hidden to us. We observe the effect, which is statistically correlated with the state. Use the correlations to decode the state sequence.

BME100 9/28/01 HMM for fair and biased coins Sequence of states: q = FFFFBBBFFFFF Observed sequence: x = HTTHTTTTHHTH With complete information, can compute: P(x,q) = 0.5 * 0.5 * 0.8 * 0.5 * 0.8 * 0.5 * 0.8 * 0.5 * 0.2 * Otherwise:

BME100 9/28/01 Three key HMM algorithms Forward algorithm. Given observed sequence x and an HMM M, calculate P(x|M). Viterbi algorithm. Given x and M, calculate the most likely state sequence q. Forward-backward algorithm. Given many observed sequences, estimate the parameters of the HMM.

BME100 9/28/01 Some HMM Topologies

BME100 9/28/01 Modeling protein families with linear profile HMMs Observed sequence is the amino acid sequence of a protein. Typically want to model a group of related proteins. Model states and transitions will be based on a multiple alignment of the group. No transitions from right to left.

BME100 9/28/01 From multiple alignment to profile HMM Good model of these proteins must reflect: –highly conserved positions in the alignment –variable regions in the alignment –varying lengths of protein sequences NF.....A- DF.....SY NYrqsanS- NFapistAY DFvlamrSF

BME100 9/28/01 From multiple alignment to profile HMM NF.....A- DF.....SY NYrqsanS- NFapistAY DFvlamrSF P(N)=0.6 P(D)=0.4 P(R)=0.13 P(Q)=0.07 P(A)=0.2 Three kinds of states: match insert silent P(F)=0.8 P(Y)= P(S)=0.6 P(A)=0.4 P(Y)=0.67 P(F)= Start

BME100 9/28/01 Finding probability of a sequence with an HMM Once we have an HMM for a group of proteins, we are often interested in how well a new sequence fits the model. We want to compute a probability for our sequence with respect to the model.

BME100 9/28/01 One sequence many paths A protein sequence can be represented by many paths through the HMM. P(N)=0.6 P(D)=0.4 - P(F)=0.8 P(Y)=0.2 P(S)=0.6 P(A)=0.4 P(Y)=0.67 P(F)=0.33 P(R)=0.13 P(Q)=0.07 P(A)= DYAF Start

BME100 9/28/01 One sequence many paths A protein sequence can be represented by many paths through the HMM. P(N)=0.6 P(D)=0.4 P(F)=0.8 P(Y)=0.2 P(S)=0.6 P(A)=0.4 P(Y)=0.67 P(F)=0.33 P(R)=0.13 P(Q)=0.07 P(A)= DYAF Start

BME100 9/28/01 Finding the probability of a sequence with an HMM Not knowing the state sequence q, we’ll have to use either the forward or the Viterbi algorithm. Basic recurrence relation for Viterbi: P(v t ) def. Prob of most probable path ending in state q t with obs x t P(v o ) = 1 P(v t ) = max P(v t-1 ) * P(q t | q t-1 ) * P(x t ) Compute with dynamic programming.

BME100 9/28/01 M1 I1 M2 I2 M3 I3 M4 I4 D Y A F Most probable path: Viterbi algorithm P(N)=0.6 P(D)=0.4 - P(F)=0.8 P(Y)=0.2 P(S)=0.6 P(A)=0.4 P(Y)=0.67 P(F)=0.33 P(R)=0.13 P(Q)=0.07 P(A)= DYAF Start for t=1 to n P(v t ) = max P(v t-1 ) * P(q t | q t-1 ) * P(x t ) D in M11.0*1.0*0.4 D in I11.0*0*0 Y in M11.0*1.0*0 Y in I11.0*0*0 A in M11.0*1.0*0 A in I11.0*0*0 F in M11.0*1.0*0 F in I11.0*0*0 P(v 1 ) P(v 2 ) P(v 3 )P(v 4 ) D in M20.4*1.0*0 D in I20.4*0*0 Y in M20.4*1.0*0.2 Y in I20.4*0*0 A in M20.4*1.0*0 A in I20.4*0*0 F in M20.4*1.0*0.8 F in I20.4*0*0 P(v 2 ) D in M30.32*0.4*0 D in I30.32*0.6*0 Y in M30.32*0.4*0 Y in I30.32*0.6*0 A in M30.32*0.4*0.4 A in I30.32*0.6*0.2 F in M30.32*0.4*0 F in I30.32*0.6*0 P(v 3 ) D in M40.051*0.6*0 D in I40.051*0*0 Y in M40.051*0.6*0.67 Y in I40.051*0*0 A in M40.051*0.6*0 A in I40.051*0*0 F in M40.051*0.6*0.33 F in I40.051*0*0 P(v 4 )

BME100 9/28/01 Overfitting problems Our toy example illustrates a problem with estimating probability distributions from small samples. P(aa other than D or N)=0 at position 1. Family members which don’t begin with D or N can’t be recognized by the model. Probability distribution in Match State 1

BME100 9/28/01 Model regularization Use pseudocounts. If an amino acid does not appear in a column of the alignment, give it a fake count. NF.....A- DF.....SY NYrqsanS- NFifistAY DFvlpmrSF Observed counts of A in column 1 Pseudocounts of A in column 1 Observed counts over all amino acids in column 1 Pseuodcounts over all amino acids in column 1 Observed counts of N in column 1 Pseudocounts of N in column 1 Observed counts over all amino acids in column 1 Pseuodcounts over all amino acids in column 1

BME100 9/28/01 Model regularization Pseudocounts smooth the column probability distributions In practice, often pseudocounts are added by fitting the column to a set of typical amino acid distributions found in the columns of protein multiple alignments. Probability distribution in Match State 1

BME100 9/28/01 To come: HMMs can be used to automatically produce a high-quality multiple alignment. Active areas of research: –Building HMMs that can recognize very distantly related proteins –Multi-track HMMs