Hidden Markov Models: Applications in Bioinformatics Gleb Haynatzki, Ph.D. Creighton University March 31, 2003.

Slides:



Advertisements
Similar presentations
Hidden Markov Model in Biological Sequence Analysis – Part 2
Advertisements

Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
Ulf Schmitz, Statistical methods for aiding alignment1 Bioinformatics Statistical methods for pattern searching Ulf Schmitz
Lecture 2 Hidden Markov Model. Hidden Markov Model Motivation: We have a text partly written by Shakespeare and partly “written” by a monkey, we want.
Hidden Markov Model.
Hidden Markov models and its application to bioinformatics.
Introduction to Hidden Markov Models
Hidden Markov Models Fundamentals and applications to bioinformatics.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Patterns, Profiles, and Multiple Alignment.
Statistical NLP: Lecture 11
Hidden Markov Models in Bioinformatics
Profiles for Sequences
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models in Bioinformatics Applications
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
Albert Gatt Corpora and Statistical Methods Lecture 8.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT. 2 HMM Architecture Markov Chains What is a Hidden Markov Model(HMM)? Components of HMM Problems of HMMs.
. Sequence Alignment via HMM Background Readings: chapters 3.4, 3.5, 4, in the Durbin et al.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models: an Introduction by Rachel Karchin.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Face Recognition Using Embedded Hidden Markov Model.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
Hidden Markov Models In BioInformatics
Introduction to Profile Hidden Markov Models
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Hidden Markov Models for Sequence Analysis 4
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Hidden Markov Models A first-order Hidden Markov Model is completely defined by: A set of states. An alphabet of symbols. A transition probability matrix.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
CZ5226: Advanced Bioinformatics Lecture 6: HHM Method for generating motifs Prof. Chen Yu Zong Tel:
John Lafferty Andrew McCallum Fernando Pereira
Construction of Substitution matrices
(H)MMs in gene prediction and similarity searches.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Hidden Markov Models BMI/CS 576
Hidden Markov Models Part 2: Algorithms
Hidden Markov Autoregressive Models
Hidden Markov Models (HMMs)
CSE 5290: Algorithms for Bioinformatics Fall 2009
Presentation transcript:

Hidden Markov Models: Applications in Bioinformatics Gleb Haynatzki, Ph.D. Creighton University March 31, 2003

Definition A Hidden Markov Model (HMM) is a discrete-time finite-state Markov chain coupled with a sequence of letters emitted when the Markov chain visits its states. Q States ( Q ): q 1 q 2 q 3... O Letters ( O ): O 1 O 2 O 3

Definition (Cont’d) O QThe sequence O of emitted letters is called “the observed sequence” because we often know it while not knowing the state sequence Q, which is in this case called “hidden”. The triple represents the full set of parameters of the HMM, where P is the transition probability matrix of the Markov chain, B is the emission probability matrix, and denotes the initial distribution vector of the Markov chain. =(P, B,)

Important Calculations O Given any observed sequence O = (O 1,…,O T ) Oand, efficiently calculate P(O | ) Q Q QOand, efficiently calculate the hidden sequence Q = (q 1,…,q T ) that is most likely to have occurred; i.e. find argmax Q P(Q | O) and assuming a fixed graph structure of the underlying Markov chain, find the parameters O = maximizing P(O | ) (P, B,)

Applications of HMM Modeling protein families: (1) construct multiple sequence alignments (2) determine the family of a query sequence Gene finding through semi-Hidden Markov Models (semiHMM)

HMM for Sequence Alignment Consider the following Markov chain underlying a HMM, with three types of states:   “match”;  “insert”;   “delete”

HMM for Sequence Alignment (Con’t) The alphabet A consists of the 20 amino acids and a “delete” symbol ( ) Delete states output only with probability 1 Each insert & match state has its own distribution over the 20 amino acids and does not output

HMM for Sequence Alignment (Con’t) There are two extreme situations depending on the HMM parameters: The emission probs for the match & insert states are uniform over the 20 amino acids the model produces random sequences Each state emits one specific amino acid with prob 1 & m i  m i+1 with prob 1 the model produces the same sequence always

HMM for Sequence Alignment (Con’t) Between the two extremes consider a “family” of somewhat similar sequences: A “tight” family of very similar sequences A “loose” family with little similarity Similarity may be confined to certain areas of the sequences – if some match states emit a few amino acids, while other match states emit all amino acids uniformly/randomly

HMM for Sequence Alignments: Procedure (A) Start with “training”, or estimating, the parameters of the model using a set of training sequences from the protein family (B) Next, compute the path of states most likely to have produced each sequence (C) Amino acids are aligned if both are produced by the same match state in their paths (D) Finally, indels are inserted appropriately for insertions and deletions

Important Calculations O Given any observed sequence O = (O 1,…,O T ) Oand, efficiently calculate P(O | ) Q Q QOand, efficiently calculate the hidden sequence Q = (q 1,…,q T ) that is most likely to have occurred; i.e. find argmax Q P(Q | O) and assuming a fixed graph structure of the underlying Markov chain, find the parameters O = maximizing P(O | ) (P, B,)

Example Consider: CAEFDDH, CDAEFPDDH Suppose the model has length 10, and the most likely paths for the two sequences are: m 0 m 1 m 2 m 3 m 4 d 5 d 6 m 7 m 8 m 9 m 10 and m 0 m 1 i 1 m 2 m 3 m 4 d 5 m 6 m 7 m 8 m 9 m 10

Example (Cont’d) The alignment induced is found by aligning positions generated by the same match state: m 0 m 1 m 2 m 3 m 4 d 5 d 6 m 7 m 8 m 9 m 10 C A E F D D H C D A E F P D D H m 0 m 1 i 1 m 2 m 3 m 4 d 5 m 6 m 7 m 8 m 9 m 10

Example (End) This leads to the following alignment: C– AEF–DDH CDAEFPDDH

HMM: Strengths & Weaknesses HMM aligns many sequences with little computing power HMM allows the sequences themselves to guide the alignment Alignments by HMM are sometimes ambiguous and some regions are left unaligned in the end HMM weaknesses come from their strengths: the Markov property and stationarity

Thank you.