Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.

Slides:



Advertisements
Similar presentations
Hidden Markov Model in Biological Sequence Analysis – Part 2
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
HIDDEN MARKOV MODELS IN COMPUTATIONAL BIOLOGY CS 594: An Introduction to Computational Molecular Biology BY Shalini Venkataraman Vidhya Gunaseelan.
Ulf Schmitz, Statistical methods for aiding alignment1 Bioinformatics Statistical methods for pattern searching Ulf Schmitz
Learning HMM parameters
Hidden Markov Model.
Hidden Markov Models in Bioinformatics
Pfam(Protein families )
Hidden Markov Models.
MNW2 course Introduction to Bioinformatics
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
Patterns, Profiles, and Multiple Alignment.
Hidden Markov Models Modified from:
Hidden Markov models for detecting remote protein homologies Kevin Karplus, Christian Barrett, Richard Hughey Georgia Hadjicharalambous.
Profiles for Sequences
JM - 1 Introduction to Bioinformatics: Lecture XIII Profile and Other Hidden Markov Models Jarek Meller Jarek Meller Division.
Using PFAM database’s profile HMMs in MATLAB Bioinformatics Toolkit Presentation by: Athina Ropodi University of Athens- Information Technology in Medicine.
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
Hidden Markov Models in Bioinformatics Example Domain: Gene Finding Colin Cherry
Biochemistry and Molecular Genetics Computational Bioscience Program Consortium for Comparative Genomics University of Colorado School of Medicine
Profile HMMs for sequence families and Viterbi equations Linda Muselaars and Miranda Stobbe.
Hidden Markov Models Sasha Tkachev and Ed Anderson Presenter: Sasha Tkachev.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT. 2 HMM Architecture Markov Chains What is a Hidden Markov Model(HMM)? Components of HMM Problems of HMMs.
Profile-profile alignment using hidden Markov models Wing Wong.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Comparative ab initio prediction of gene structures using pair HMMs
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Phylogenetic Shadowing Daniel L. Ong. March 9, 2005RUGS, UC Berkeley2 Abstract The human genome contains about 3 billion base pairs! Algorithms to analyze.
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Presented by Liu Qi An introduction to Bioinformatics Algorithms Qi Liu
Multiple Sequence Alignment BMI/CS 576 Colin Dewey Fall 2010.
Introduction to Profile Hidden Markov Models
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
MNW2 course Introduction to Bioinformatics Lecture 22: Markov models Centre for Integrative Bioinformatics FEW/FALW
Hidden Markov Models for Sequence Analysis 4
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Hidden Markov Models A first-order Hidden Markov Model is completely defined by: A set of states. An alphabet of symbols. A transition probability matrix.
Hidden Markov Modeling, Multiple Alignments and Structure Bioinformatic Modeling Techniques Student: Patricia Pearl.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
CZ5226: Advanced Bioinformatics Lecture 6: HHM Method for generating motifs Prof. Chen Yu Zong Tel:
From Genomics to Geology: Hidden Markov Models for Seismic Data Analysis Samuel Brown February 5, 2009.
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
(H)MMs in gene prediction and similarity searches.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
1 Applications of Hidden Markov Models (Lecture for CS498-CXZ Algorithms in Bioinformatics) Nov. 12, 2005 ChengXiang Zhai Department of Computer Science.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Introduction to Profile HMMs
Hidden Markov Models BMI/CS 576
Free for Academic Use. Jianlin Cheng.
HIDDEN MARKOV MODELS IN COMPUTATIONAL BIOLOGY
CSE 5290: Algorithms for Bioinformatics Fall 2009
Presentation transcript:

Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering The Pennsylvania State University

Outline Introduction to HMMs Profile HMMs Available resources for Profile HMMs Some online demonstrations

Introduction to HMMs Hidden Markov Models – Formalism statistical techniques for modeling patterns in data First order Markov property - memorylessness state generally a hidden entity which spawns symbols or features the same symbol could be emitted by several states HMM characterized by transition probabilities and emission distribution

Introduction to HMMs Hidden Markov Models – Parameter Estimation Parameters- transition probabilities and emission probabilities iterative computational algorithms used EM algorithm, Viterbi algorithm algorithms based on dynamic programming to save computational cost usually the iterations involve variants of the following two steps estimate state sequence which maximizes likelihood under a parameter set update parameter set based on the estimated state sequence algorithms converge to local optima sometimes

Outline Introduction to HMMs Profile HMMs Available resources for Profile HMMs Some online demonstrations

Profile Hidden Markov Models Stochastic methods to model multiple sequence alignments – proteins and dna sequences Potential application domains: protein families could be modeled as an HMM or a group of HMMs constructing a profile HMM new protein sequences could be aligned with stored models to detect remote homology aligning a sequence with a stored profile HMM align two or more protein family profile HMMs to detect homology finding statistical similarities between two profile HMM models

Profile Hidden Markov Models Constructing a profile HMM A multiple sequence alignment assumed each consensus column can exist in 3 states match, insert and delete states number of states depends upon length of the alignment

Profile Hidden Markov Models A typical profile HMM architecture squares represent match states diamonds represent insert states circles represent delete states arrows represent transitions

Profile Hidden Markov Models A typical profile HMM architecture transition between match states - transition from match state to insert state - transition within insert state - transition from match state to delete state - transition within delete state - emission of symbol at a state -

Profile Hidden Markov Models Estimation of parameters transition probabilities estimated as frequency of a transition in a given alignment emission probabilities estimated as frequency of an emission in a given alignment pseudo counts usually introduced to account for transititions / emissions which were not present in the alignment

Profile Hidden Markov Models Estimation of parameters with pseudo counts Dirichlet prior distribution used to determine pseudo counts

Profile Hidden Markov Models Scoring a sequence against a profile HMM Viterbi algorithm used to find the best state path Simulated annealing based methods also used Maximization criteria – log likelihood or log odds Log likelihood score generally depends on length of sequence and hence not preferred If an alignment not given initially, the alignment could be learnt iteratively using Viterbi

Profile Hidden Markov Models Comparing two profile HMMs Profile-profile comparison tool based on information theory based on Kullback-Leibler divergence criterion for comparing 2 statistical distributions dynamic programming used to compare entire profiles detect weak similarities between models

Outline Introduction to HMMs Profile HMMs Available resources for Profile HMMs Some online demonstrations

Available resources for Profile HMMs HMMER and SAM one of the first available programs for profile HMMs HMMER : S Eddy at Washington University SAM : Sequence alignment and Modeling System R. Hughey at University of California, Santa Cruz available free for research SAM has online servers to perform sequence comparisons

Available resources for Profile HMMs InterPro consortium in Europe has many resources for protein data Database of protein families and domains Brings together several different databases under one umbrella Pfam and Superfamily are profile HMM libraries associated with Interpro Pfam based on HMMER search and Superfamily based on SAM search and modeling

Available resources for Profile HMMs SAM’s iterative approach for building HMM find a set of close homologs using BLASTP learn the alignment and build model using close homologs use BLASTP to get more remote homologs using the first set of sequences (relax the E value) iteratively refine the HMM model SAM uses Dirichlet priors as pseudo counts for parameters Hand tuned seed alignments not required as the alignments are learnt by the algorithm – unlike HMMER

Available resources for Profile HMMs SUPERFAMILY database incorporates: library of profile HMMs representing all proteins of known structure assignments to predicted proteins from all completely sequenced genomes search and alignment services models and domain assignments are freely available Based on SCOP classification of protein domains SAM HMM iterative procedure used for model building and sequence alignment

Available resources for Profile HMMs In Superfamily: Each SCOP superfamily is represented as an HMM model Model built using SAM procedure based 4 variants accurate structure based alignments hand labeled alignments autonomic alignments using ClustalW sequence members used separately as seeds Assignment of superfamilies for a given sequence, every model is scored across the whole sequence using Viterbi scoring model which scores highest has its superfamily assigned to the region

Outline Introduction to HMMs Profile HMMs Available resources for Profile HMMs Some online demonstrations

Online Demonstrations

References Durbin. R, Eddy. S, Krough. A, and Mitchenson. G, ``Biological Sequence Analysis’’, Cambridge University Press, 2002 Baldi. P and Brunak. S, ``Bioinformatics, the Machine Learning Approach’’, the MIT Press, Cambridge, 1998 Eddy. S, ``Profile Hidden Markov Models’’, Bioinformatics Review, vol. 19, no. 8, pp , 1998 Karplus. K, Barrett. C, and Hughey. R, ``Hidden Markov models for detecting remote homologies’’, Bioinformatics, vol. 14, no. 10, pp , 1998 Madera. M, Gough, J, ``A comparison of profile hidden Markov model procedures for remote homology detection’’, Nucleic Acids Research, vol. 30, no. 19, pp , 2002 Gough. J, Karplus. K, Hughey. R, and Chothia. C, ``Assignment of Homology to Genome Sequences using a Library of Hidden Markov Models that represent all Proteins of known structure’’, J. Mol. Biol., 313, pp , 2001

References Yona. G, Levitt. M, ``Within the Twilight Zone: A sensitive Profile- Profile comparison tool based on Information Theory’’, J. Mol. Biol., 315, , 2002 Mandera. M, Vogel. C, Kummerfeld. K, Chothia. C, and Gough. J, ``The SUPERFAMILY database in 2004: additions and improvements’’, Nucleic Acids Research, vol. 32, Database Issue, D , 2004 Bateman. A, Birney. E, Durbin. R, Eddy. S, Finn. R, Sonnhammer. E, ``Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins’’, Nucleic Acids Research, vol. 27, no. 1, 1999 Andreeva. A, et. al., ``SCOP database in 2004: refinements integrate structure and sequence family data’’, Nucleic Acids Research, vol. 32, Database Issue, D226-D229,2004 Many other online resources and tutorials