Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Hidden Markov Model for Protein Secondary Structure Prediction

Similar presentations


Presentation on theme: "A Hidden Markov Model for Protein Secondary Structure Prediction"— Presentation transcript:

1 A Hidden Markov Model for Protein Secondary Structure Prediction
Wei-Mou Zheng Institute of Theoretical Physics Academia Sinica PO Box 2735, Beijing

2 Outline Protein structure A brief review of secondary structure prediction Hidden Markov model: simple-minded Hidden Markov model: realistic Discussion References

3 Hydrophobic Charged+- Polar
Protein sequences are written in 20 letters (20 Naturally-occurring amino acid residues): AVCDE FGHIW KLMNY PQRST Hydrophobic Charged+- Polar

4 Residues form a directed chain
Cis- Trans-

5 H: E: C = 34.9: 21.8: 43.3 Rasmol ribbon diagram of GB1
Helix (pink), sheets (yellow) and coil (grey) Hydrogen-bond network 3D structure → secondary structure written in three letters:H, E, C. H: E: C = 34.9: 21.8: 43.3

6 Bayes formula Count of Generally, P(x, y) = P(x|y)P(y),

7 Protein sequence A, {ai}, i=1,2,…,n
Secondary structure sequence S, {si}, i=1,2,…,n Secendary structure prediction: 1D amino acid sequences → 1D secondary structure sequence An old problem for more than 30 years Inference of S from A: P(S |A ) 1. Simple Chou-fasman approach Chou-Fasman’s propensity of amino acid to conformational state + independence approximation

8 Parameter Training Propensities q(a,s) Counts (20x3) from a database: N(a, s) sum over a → N(s), sum over s → N(a), sum over a and s → N q(a,s) = [N(a,s) N] / [N(a) N(s)].

9 2. Garnier-Osguthorpe-Robson (GOR) window version
Conditional Independency Weight matrix (20x17)x3 P(W|s) 3. Improved GOR (20x20x16x3, to include pair correlation)

10 Hidden Markov Model (HMM): simple-minded
Bayesian formula: P(S|A) = P(S,A)/P(A) ~ P(S,A) = P(A|S) P(S) Simple version emitting ai at si Markov chain according to P(a|s) For hidden sequence Forward and backward functions a1 a2 a3 s1 s2 s3

11 Initial conditions and recursion relations
Partition function Linear algorithm: Dynamic programming Baum-Welch (sum) & Viterbi (max)

12 Prob(si=s, si+1=s’) = Ai(s) tss’ P(ai+1|s’) Bi+1(s’)/Z
Prob(si:j)

13 Hidden Markov Model: Realistic
1) Strong correlation in conformational states: at least two consicutive E and three consicutive H refined conformational states (243 → 75) 2) Emission probabilities → improved window scores Proportion of accurately predicted sites ~ 70% (compared with < 65% for prediction based on a single sequence) No post-prediction filtering Integrated (overall) estimation of refined conformation states Measure of prediction confidence

14 Discussions HMM using refined conformational states and window scores is efficient for protein secondary structure prediction. Better score system should cover more correlation between conformation and sequence. Combining homologous information will improve the prediction accuracy. From secondary structure to 3D structure (structure codes: discretized 3D conformational states)

15 References Lawrence R Rabiner, A tutorial on hidden Markov models and selected appllications in speech recognition Proceeding of the IEEE, 77 (1989) Burkhard Rost Protein Secondary Structure Prediction Continues to Rise Journal of Structural Biology 134, 204–218 (2001)

16 The End

17 Small P Tiny G I A V Aliphatic L C S N T D Q M E Y K F H R Negative W Positive Aromatic Hydrophobic Polar


Download ppt "A Hidden Markov Model for Protein Secondary Structure Prediction"

Similar presentations


Ads by Google