A Hidden Markov Model for Protein Secondary Structure Prediction

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

Applying Hidden Markov Models to Bioinformatics
HIDDEN MARKOV MODELS IN COMPUTATIONAL BIOLOGY CS 594: An Introduction to Computational Molecular Biology BY Shalini Venkataraman Vidhya Gunaseelan.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Chapter 15 Probabilistic Reasoning over Time. Chapter 15, Sections 1-5 Outline Time and uncertainty Inference: ltering, prediction, smoothing Hidden Markov.
Hidden Markov Models.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Modified from:
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Statistical NLP: Lecture 11
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Advanced Artificial Intelligence
INTRODUCTION TO Machine Learning 3rd Edition
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Part II. Statistical NLP Advanced Artificial Intelligence Hidden Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
… Hidden Markov Models Markov assumption: Transition model:
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Lecture 6, Thursday April 17, 2003
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Model Special case of Dynamic Bayesian network Single (hidden) state variable Single (observed) observation variable Transition probability.
Garnier-Osguthorpe-Robson
Hidden Markov Models: an Introduction by Rachel Karchin.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
Modeling biological data and structure with probabilistic networks I Yuan Gao, Ph.D. 11/05/2002 Slides prepared from text material by Simon Kasif and Arthur.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Hidden Markov Models David Meir Blei November 1, 1999.
Hidden Markov Models 戴玉書
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Hidden Markov Models for Sequence Analysis 4
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Secondary structure prediction
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
1 MARKOV MODELS MARKOV MODELS Presentation by Jeff Rosenberg, Toru Sakamoto, Freeman Chen HIDDEN.
Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.
Pattern Recognition and Machine Learning-Chapter 13: Sequential Data
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Protein Prediction with Neural Networks! Chris Alvino CS152 Fall ’06 Prof. Keller.
Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, March 14, 2013 Session 8: Sequence Labeling This work is licensed under.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
MACHINE LEARNING 16. HMM. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Modeling dependencies.
Hidden Markov Models Part 2: Algorithms
Chapter14-cont..
Presentation transcript:

A Hidden Markov Model for Protein Secondary Structure Prediction Wei-Mou Zheng Institute of Theoretical Physics Academia Sinica PO Box 2735, Beijing 100080 zheng@itp.ac.cn

Outline Protein structure A brief review of secondary structure prediction Hidden Markov model: simple-minded Hidden Markov model: realistic Discussion References

Hydrophobic Charged+- Polar Protein sequences are written in 20 letters (20 Naturally-occurring amino acid residues): AVCDE FGHIW KLMNY PQRST Hydrophobic Charged+- Polar

Residues form a directed chain Cis- Trans-

H: E: C = 34.9: 21.8: 43.3 Rasmol ribbon diagram of GB1 Helix (pink), sheets (yellow) and coil (grey) Hydrogen-bond network 3D structure → secondary structure written in three letters:H, E, C. H: E: C = 34.9: 21.8: 43.3

Bayes formula Count of Generally, P(x, y) = P(x|y)P(y),

Protein sequence A, {ai}, i=1,2,…,n Secondary structure sequence S, {si}, i=1,2,…,n Secendary structure prediction: 1D amino acid sequences → 1D secondary structure sequence An old problem for more than 30 years Inference of S from A: P(S |A ) 1. Simple Chou-fasman approach Chou-Fasman’s propensity of amino acid to conformational state + independence approximation

Parameter Training Propensities q(a,s) Counts (20x3) from a database: N(a, s) sum over a → N(s), sum over s → N(a), sum over a and s → N q(a,s) = [N(a,s) N] / [N(a) N(s)].

2. Garnier-Osguthorpe-Robson (GOR) window version Conditional Independency Weight matrix (20x17)x3 P(W|s) 3. Improved GOR (20x20x16x3, to include pair correlation)

Hidden Markov Model (HMM): simple-minded Bayesian formula: P(S|A) = P(S,A)/P(A) ~ P(S,A) = P(A|S) P(S) Simple version emitting ai at si Markov chain according to P(a|s) For hidden sequence Forward and backward functions a1 a2 a3 s1 s2 s3

Initial conditions and recursion relations Partition function Linear algorithm: Dynamic programming Baum-Welch (sum) & Viterbi (max)

Prob(si=s, si+1=s’) = Ai(s) tss’ P(ai+1|s’) Bi+1(s’)/Z Prob(si:j)

Hidden Markov Model: Realistic 1) Strong correlation in conformational states: at least two consicutive E and three consicutive H refined conformational states (243 → 75) 2) Emission probabilities → improved window scores Proportion of accurately predicted sites ~ 70% (compared with < 65% for prediction based on a single sequence) No post-prediction filtering Integrated (overall) estimation of refined conformation states Measure of prediction confidence

Discussions HMM using refined conformational states and window scores is efficient for protein secondary structure prediction. Better score system should cover more correlation between conformation and sequence. Combining homologous information will improve the prediction accuracy. From secondary structure to 3D structure (structure codes: discretized 3D conformational states)

References Lawrence R Rabiner, A tutorial on hidden Markov models and selected appllications in speech recognition Proceeding of the IEEE, 77 (1989) 257-286 Burkhard Rost Protein Secondary Structure Prediction Continues to Rise Journal of Structural Biology 134, 204–218 (2001)

The End

Small P Tiny G I A V Aliphatic L C S N T D Q M E Y K F H R Negative W Positive Aromatic Hydrophobic Polar