Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.

Similar presentations


Presentation on theme: "Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC."— Presentation transcript:

1 Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC

2 Some historical landmarks 1 st generation – 70’s (~50-60% accuracy) single residue statistics, explicit rules Chou & Fasman 1974,GOR1 1978 2 nd generation – 80’s (~60-70% accuracy) single residue statistics, nearest-neighbors, neural network (more with local interaction) GOR3 1987, Levin et al. 1986, Qian & Sejnowski 1988, Holly & Karplus, 1989 3 rd generation – 90’s (~78% accuracy) neural network with homologous sequence information PHD 1993, PSIPRED 1999, SSPRO 2000

3 Chou-Fasman method Straight statistical approach Conformational propensity e.g. helical propensity Extend the nucleation sites till a threshold Categorize each amino acid e.g. helix former, helix breaker, helix indifferent Find nucleation sites short sequence with high concentration of a category Handle overlaps

4 Conformational parameters What is the drawback of the method? Chou-Fasman method (Table from Krane and Raymer’s book)

5 Introduction to neural network A perceptron An analogy – apple and orange sorter Threshold unit – classify a vector of inputs Weight ! How to get it? Shape (X1) Texture (x2) Color (X3) Apple(+1)RoundHardred Orange(-1)RoundSoftyellow A self learning system – using a training data set

6 Basics in neural network (1 unit only) Problem about weight Do not fit examples exactly - minimize an error function Modify threshold unit a little bit Step function vs. continuous threshold function  (a)

7 Squared error function E(w) Minimize error E(w) - using gradient descent method Weight update in each step Learning rate  Basics in neural network (1 unit only)

8 Basic neural network in secondary structure prediction (Figure from Kneller et. al. JMB 1990) x1x2x3x4 w 11 w 12 w 13 w 14 y1y1 y2y2 y3y3 Activation a 1 = Output y 1 = E1E1 E2E2 E3E3 Error E 1 =

9 Multi-layer neural network Complete neural network - a set of continuous threshold units interconnected in a topology - output of some unit is input of other units x1x2x3x4 Input units (x) Output units (z) Hidden units (y)

10 PHD method (Rost B. & Sander C, JMB 1993) Use profile of multiple sequence alignment Multiple layers Accuracy >70%

11 Protein Folding Problem A protein folds into a unique 3D structure in physiological condition 3D structure is a key to understand function mechanism Rational drug design 3D structure prediction What is the protein folding problem?

12 Protein Folding Problem Hard? Sampling conformational space SS structures offer simplicity Side chain filling the space May not be random search Free energy (  G) = Interaction energy – Entropic energy Can it be done?

13 Protein Folding Problem Experimental finding Protein does not start folding from the end SS seem to fold early Hydrophobic aa in the core Hydrophilic aa on surface Energy function approximation Physics based (bond length, bond angle, pair interactions) Statistics based

14 Scope of the problem Majority of the newly solved protein structure share certain level of similarity with a known structure Certain families of proteins have no or few structures solved Human genes ~20k Structure genomics initiative

15 Protein structure prediction Comparative modeling >30% sequence identify Fold recognition – formally known as threading twilight zone <25% sequence identity Ab initio new fold

16 CASP Experimentally solved structure Predicted structure Compare and rank CASP – e.g. Skolnick (2003) Proteins: 53:p469-79 Ginalski (2003) Proteins: 53: p410-17 Zhang, Y. “Template-based modeling and free modeling by I-TASSER in CASP7 (pages 108– 117)” Proteins, 69, S8, P108-17 (2007).Template-based modeling and free modeling by I-TASSER in CASP7 (pages 108– 117)

17 Comparative Modeling Search for structures Select templates Align target sequence with structures Build model Evaluate model http://www.salilab.org/~andras/watanabe/main.html Sequence identity vs. structure overlap (Fig)

18 Comparative Modeling Search for structures: pair-wise sequence alignment with database multiple sequence alignment -> profile fold assignment / threading – use structure information in comparison Select template: sequence similarity, evolutionary relationship, environment, resolution Sequence alignment (target and template) standard method with tune

19 Ab Inito Prediction Challenge: Search space Energy function Reduction in search space use lattice use simplified amino acids use building blocks available in nature Energy function: physics statistics - empirical

20 Ab inito 3D Structure prediction Simons KT, Kooperberg C, Huang E, Baker D; J Mol Biol. (1997) 268, 209-225 Schonbrun J, Wedemeyer W, Baker D; Current Opinion in Structure biology, (2002), 12:348-54 An example - ROSETTA ROSETTA narrow search - use local structure available statistical based energy function one of the top few ab initio methods in CASP4.

21 ROSETTA – segment matching Observations: Analysis of 9-a.a. segments in structure database distribution of the conformations of 9-mers Main idea of the method build segment conformational library (fragment library for 3mer and 9mer) put pieces together better (energy function and search space)

22 Model Building Assembly of rigid bodies dissecting structure into core, loops and side- chains Satisfy spatial constraints (Fig.) derive spatial constraints, find a structure that optimize all the constraints spatial constraints generated from input alignment; general spatial preferences found in known structures; molecular force field;


Download ppt "Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC."

Similar presentations


Ads by Google