Presentation is loading. Please wait.

Presentation is loading. Please wait.

Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.

Similar presentations


Presentation on theme: "Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering."— Presentation transcript:

1 Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering The Pennsylvania State University

2 Outline Introduction to HMMs Profile HMMs Available resources for Profile HMMs Some online demonstrations

3 Introduction to HMMs Hidden Markov Models – Formalism statistical techniques for modeling patterns in data First order Markov property - memorylessness state generally a hidden entity which spawns symbols or features the same symbol could be emitted by several states HMM characterized by transition probabilities and emission distribution

4 Introduction to HMMs Hidden Markov Models – Parameter Estimation Parameters- transition probabilities and emission probabilities iterative computational algorithms used EM algorithm, Viterbi algorithm algorithms based on dynamic programming to save computational cost usually the iterations involve variants of the following two steps estimate state sequence which maximizes likelihood under a parameter set update parameter set based on the estimated state sequence algorithms converge to local optima sometimes

5 Outline Introduction to HMMs Profile HMMs Available resources for Profile HMMs Some online demonstrations

6 Profile Hidden Markov Models Stochastic methods to model multiple sequence alignments – proteins and dna sequences Potential application domains: protein families could be modeled as an HMM or a group of HMMs constructing a profile HMM new protein sequences could be aligned with stored models to detect remote homology aligning a sequence with a stored profile HMM align two or more protein family profile HMMs to detect homology finding statistical similarities between two profile HMM models

7 Profile Hidden Markov Models Constructing a profile HMM A multiple sequence alignment assumed each consensus column can exist in 3 states match, insert and delete states number of states depends upon length of the alignment

8 Profile Hidden Markov Models A typical profile HMM architecture squares represent match states diamonds represent insert states circles represent delete states arrows represent transitions

9 Profile Hidden Markov Models A typical profile HMM architecture transition between match states - transition from match state to insert state - transition within insert state - transition from match state to delete state - transition within delete state - emission of symbol at a state -

10 Profile Hidden Markov Models Estimation of parameters transition probabilities estimated as frequency of a transition in a given alignment emission probabilities estimated as frequency of an emission in a given alignment pseudo counts usually introduced to account for transititions / emissions which were not present in the alignment

11 Profile Hidden Markov Models Estimation of parameters with pseudo counts Dirichlet prior distribution used to determine pseudo counts

12 Profile Hidden Markov Models Scoring a sequence against a profile HMM Viterbi algorithm used to find the best state path Simulated annealing based methods also used Maximization criteria – log likelihood or log odds Log likelihood score generally depends on length of sequence and hence not preferred If an alignment not given initially, the alignment could be learnt iteratively using Viterbi

13 Profile Hidden Markov Models Comparing two profile HMMs Profile-profile comparison tool based on information theory based on Kullback-Leibler divergence criterion for comparing 2 statistical distributions dynamic programming used to compare entire profiles detect weak similarities between models

14 Outline Introduction to HMMs Profile HMMs Available resources for Profile HMMs Some online demonstrations

15 Available resources for Profile HMMs HMMER and SAM one of the first available programs for profile HMMs HMMER : S Eddy at Washington University SAM : Sequence alignment and Modeling System R. Hughey at University of California, Santa Cruz available free for research SAM has online servers to perform sequence comparisons http://www.cse.ucsc.edu/research/compbio/sam.html

16 Available resources for Profile HMMs InterPro consortium in Europe has many resources for protein data Database of protein families and domains Brings together several different databases under one umbrella Pfam and Superfamily are profile HMM libraries associated with Interpro Pfam based on HMMER search and Superfamily based on SAM search and modeling

17 Available resources for Profile HMMs SAM’s iterative approach for building HMM find a set of close homologs using BLASTP learn the alignment and build model using close homologs use BLASTP to get more remote homologs using the first set of sequences (relax the E value) iteratively refine the HMM model SAM uses Dirichlet priors as pseudo counts for parameters Hand tuned seed alignments not required as the alignments are learnt by the algorithm – unlike HMMER

18 Available resources for Profile HMMs SUPERFAMILY database incorporates: library of profile HMMs representing all proteins of known structure assignments to predicted proteins from all completely sequenced genomes search and alignment services models and domain assignments are freely available Based on SCOP classification of protein domains SAM HMM iterative procedure used for model building and sequence alignment

19 Available resources for Profile HMMs In Superfamily: Each SCOP superfamily is represented as an HMM model Model built using SAM procedure based 4 variants accurate structure based alignments hand labeled alignments autonomic alignments using ClustalW sequence members used separately as seeds Assignment of superfamilies for a given sequence, every model is scored across the whole sequence using Viterbi scoring model which scores highest has its superfamily assigned to the region

20 Outline Introduction to HMMs Profile HMMs Available resources for Profile HMMs Some online demonstrations

21 Online Demonstrations http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/temp/624288710157514.html

22 References Durbin. R, Eddy. S, Krough. A, and Mitchenson. G, ``Biological Sequence Analysis’’, Cambridge University Press, 2002 Baldi. P and Brunak. S, ``Bioinformatics, the Machine Learning Approach’’, the MIT Press, Cambridge, 1998 Eddy. S, ``Profile Hidden Markov Models’’, Bioinformatics Review, vol. 19, no. 8, pp. 755-763, 1998 Karplus. K, Barrett. C, and Hughey. R, ``Hidden Markov models for detecting remote homologies’’, Bioinformatics, vol. 14, no. 10, pp. 846- 856, 1998 Madera. M, Gough, J, ``A comparison of profile hidden Markov model procedures for remote homology detection’’, Nucleic Acids Research, vol. 30, no. 19, pp. 4321-4328, 2002 Gough. J, Karplus. K, Hughey. R, and Chothia. C, ``Assignment of Homology to Genome Sequences using a Library of Hidden Markov Models that represent all Proteins of known structure’’, J. Mol. Biol., 313, pp. 903-919, 2001

23 References Yona. G, Levitt. M, ``Within the Twilight Zone: A sensitive Profile- Profile comparison tool based on Information Theory’’, J. Mol. Biol., 315, 1257-1275, 2002 Mandera. M, Vogel. C, Kummerfeld. K, Chothia. C, and Gough. J, ``The SUPERFAMILY database in 2004: additions and improvements’’, Nucleic Acids Research, vol. 32, Database Issue, D235-239, 2004 Bateman. A, Birney. E, Durbin. R, Eddy. S, Finn. R, Sonnhammer. E, ``Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins’’, Nucleic Acids Research, vol. 27, no. 1, 1999 Andreeva. A, et. al., ``SCOP database in 2004: refinements integrate structure and sequence family data’’, Nucleic Acids Research, vol. 32, Database Issue, D226-D229,2004 Many other online resources and tutorials


Download ppt "Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering."

Similar presentations


Ads by Google