CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Slides:



Advertisements
Similar presentations
Hidden Markov Model in Biological Sequence Analysis – Part 2
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
HIDDEN MARKOV MODELS IN COMPUTATIONAL BIOLOGY CS 594: An Introduction to Computational Molecular Biology BY Shalini Venkataraman Vidhya Gunaseelan.
Ulf Schmitz, Statistical methods for aiding alignment1 Bioinformatics Statistical methods for pattern searching Ulf Schmitz
Hidden Markov Model.
Hidden Markov Models.
Hidden Markov Models.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
Hidden Markov Models Fundamentals and applications to bioinformatics.
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
Patterns, Profiles, and Multiple Alignment.
Hidden Markov Models Modified from:
Hidden Markov Models: Applications in Bioinformatics Gleb Haynatzki, Ph.D. Creighton University March 31, 2003.
Ka-Lok Ng Dept. of Bioinformatics Asia University
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Profiles for Sequences
JM - 1 Introduction to Bioinformatics: Lecture XIII Profile and Other Hidden Markov Models Jarek Meller Jarek Meller Division.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models in Bioinformatics Applications
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
Albert Gatt Corpora and Statistical Methods Lecture 8.
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Profile HMMs for sequence families and Viterbi equations Linda Muselaars and Miranda Stobbe.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
Hidden Markov Models Sasha Tkachev and Ed Anderson Presenter: Sasha Tkachev.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT. 2 HMM Architecture Markov Chains What is a Hidden Markov Model(HMM)? Components of HMM Problems of HMMs.
. Sequence Alignment via HMM Background Readings: chapters 3.4, 3.5, 4, in the Durbin et al.
Hidden Markov Models: an Introduction by Rachel Karchin.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Comparative ab initio prediction of gene structures using pair HMMs
More about Markov model.
. Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss.
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
Introduction to Profile Hidden Markov Models
Hidden Markov Models As used to summarize multiple sequence alignments, and score new sequences.
Hidden Markov Models for Sequence Analysis 4
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Chapter 6 Profiles and Hidden Markov Models. The following approaches can also be used to identify distantly related members to a family of protein (or.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
H IDDEN M ARKOV M ODELS. O VERVIEW Markov models Hidden Markov models(HMM) Issues Regarding HMM Algorithmic approach to Issues of HMM.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Hidden Markov Models A first-order Hidden Markov Model is completely defined by: A set of states. An alphabet of symbols. A transition probability matrix.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
1 MARKOV MODELS MARKOV MODELS Presentation by Jeff Rosenberg, Toru Sakamoto, Freeman Chen HIDDEN.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
Hidden Markov model BioE 480 Sept 16, In general, we have Bayes theorem: P(X|Y) = P(Y|X)P(X)/P(Y) Event X: the die is loaded, Event Y: 3 sixes.
(H)MMs in gene prediction and similarity searches.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
Introduction to Profile HMMs
Genome Annotation (protein coding genes)
Pfam: multiple sequence alignments and HMM-profiles of protein domains
HIDDEN MARKOV MODELS IN COMPUTATIONAL BIOLOGY
Presentation transcript:

CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: University of South Carolina Department of Computer Science and Engineering HAPPY CHINESE NEW YEAR

Roadmap Probablistic Models of Sequences Introduction to HMM Profile HMMs as MSA models Measuring Similarity between Sequence and HMM Profile model Summary 9/18/20152

Multiple Sequence Alignment Alignment containing multiple DNA / protein sequences Look for conserved regions → similar function Example: #Rat ATGGTGCACCTGACTGATGCTGAGAAGGCTGCTGT #Mouse ATGGTGCACCTGACTGATGCTGAGAAGGCTGCTGT #Rabbit ATGGTGCATCTGTCCAGT--- GAGGAGAAGTCTGC #Human ATGGTGCACCTGACTCCT--- GAGGAGAAGTCTGC #Oppossum ATGGTGCACTTGACTTTT--- GAGGAGAAGAACTG #Chicken ATGGTGCACTGGACTGCT--- GAGGAGAAGCAGCT #Frog ---ATGGGTTTGACAGCACATGATCGT--- CAGCT 3

Probablistic Model: Position-specific scoring matrices (PSSM) Limitations of PSSM?

Difficulty in biological sequences Variation in a family of sequences ◦ Gaps of variable lengths ◦ Conserved segments with different degrees ◦ PSSM cannot handle variable-length gaps ◦ Need a statistical sequence model 5

Regular Expressions Model Regular expressions ◦ Protein spelling is much more free that English spelling ◦ ◦ [AT] [CG] [AC] [ACGT]* A [TG] [GC] 6 Limitation of Regular expression model?

Roadmap Probablistic Models of Sequences Introduction to HMM Profile HMMs as MSA models Measuring Similarity between Sequence and HMM Profile model Summary 9/18/20157

Hidden Markov Model (HMM) HMM is: ◦ Statistical model ◦ Well suited for many tasks in molecular biology Using HMM in molecular biology ◦ Probabilistic profile (profile HMM)  From a family of proteins, for searching a database for other members of the family  Resemble the profile and weight matrix methods ◦ Grammatical structure  Gene finding  Recognize signals  Prediction (must follow the rules of a gene) 8

Detect Cheating in Coin Toss Game Fair and biased coins could be used Question: is it possible to determine whether a biased coin has been used based on the output sequence of the Head/Tail sequence? HTTTHTHTHTTHHHHT HTHTHTHHHHTHT

EXAMPLE : Fair Coin Toss Consider the single coin scenario We could model the process producing the sequence of H’s and T’s as a Markov model with two states, and equal transition probabilities: TH 0.5 Only one fair coin is used here

Example: Fair and Biased Coins Consider the scenario where there are two coins: Fair coin and Biased coin Visible state do not correspond to hidden state - Visible state : Output of H or T - Hidden state : Which coin was tossed HTTTHTHTHTTHHHHTHTHTHTHHHHTHT

12 Hidden Markov Models

13 Ingredients of a HMM Collection of states:{S 1, S 2,…,S N } State transition probabilities (transition matrix) A ij = P(q t+1 = S i | q t = S j ) Initial state distribution  i = P(q 1 = S i ) Observations:{O 1, O 2,…,O M } Observation probabilities: B j (k) = P(v t = O k | q t = S j )

14 Ingredients of Our HMM States:{S sunny, S rainy, S snowy } State transition probabilities (transition matrix) A = Initial state distribution  i = ( ) Observations:{O 1, O 2,…,O M } Observation probabilities (emission matrix): B =

15 Probability of a Sequence of Events P(O) = P(O gloves, O gloves, O umbrella,…, O umbrella ) =  P(O | Q)P(Q) =  P(O | q 1,…,q 7 ) = 0.7 x x x x … all Q q 1,…q 7

16 Typical HMM Problems Annotation Given a model M and an observed string S, what is the most probable path through M generating S Classification Given a model M and an observed string S, what is the total probability of S under M Consensus Given a model M, what is the string having the highest probability under M Training Given a set of strings and a model structure, find transition and emission probabilities assigning high probabilities to the strings

Roadmap Probablistic Models of Sequences Introduction to HMM Profile HMMs as MSA models Measuring Similarity between Sequence and HMM Profile model Summary 9/18/201517

HMM Profiles as Sequence Models Given the multiple alignment of sequences, we can use HMM to model the sequences Each column of the alignment may be represented by a hidden state that produced that column Insertions and deletions may be represented by other states

Profile HMMs HMM with a structure that in a natural way allows position-dependent gap penalties ◦ Main states  model the columns of the alignment ◦ Insert states  model highly variable regions ◦ Delete states  to jump over one or more columns  i.e. to model the situation when just a few of the sequences have a “-” in the multiple alignment at a position 19

HMM Sequences Continued

Profile HMM Example Consider the following six sequences shown below A multiple sequence alignment of these sequences is the first step towards the processing of inducing the hidden markov model SEQ1 G C C C A SEQ2 A G C SEQ3 A A G C SEQ4 A G A A SEQ5 A A A C SEQ6 A G C

Profile HMM Topology The topology of HMM is established using consensus sequence The structure of a Profile HMM is shown below:- The square box represent match states Diamonds represent insert states Circles represent delete states

Profile HMM Example Continued The aligned columns correspond to either emissions from the match state or to emissions from the insert state The consensus columns are used to define the match states M 1,M 2,M 3 for the HMM After defining the match states, the corresponding insert and delete states are used to define the complete HMM topology

Transition Probabilities The values of the transition probabilities are computed using the frequency of the transitions as each sequence is considered The model parameters are computed using the state transition sequences shown in the figure below:-

Transition Probabilities Continued The frequency of each of the transitions and the corresponding emission probabilities are shown below State MMMDMIMMMDMI IMIDIIIMIDII DMDDDIDMDDDI

Emission Probabilities The emission probability is computed using the formula:- The emission probability specifies the probability of emitting each of the symbols in |∑ | in the state k

Emission Probabilities Continued The emission probability for each state is computed as shown below:

Searching the Profile HMM Sequences can be searched against the HMM to detect whether or not they belong to a particular family of sequences described by the profile HMM Using a global alignment, the probability of the most probable alignment and sequence can be determined using the Viterbi algorithm Full probability of a sequence aligning to the profile HMM determined using the forward algorithm

How A Sequence Fit a Model? ◦ Probability depends on the length of the sequence ◦ Not suitable to use as a score 29

Length-independent Score Log-odds score ◦ The logarithm of the probability of the sequence divided by the probability according to a null model ◦ 30

Length-independent Score HMM using log-odds ◦ 31

Summary HMM How to build Profile HMM model Scoring Fit between Sequence and HMM model

Next Lecture Gene-finding Reading: ◦ Textbook (CG) chapter 4 ◦ Textbook (EB) chapter 8