Presentation is loading. Please wait.

Presentation is loading. Please wait.

Burkhard Rost (Columbia New York) Some gory details of protein secondary structure prediction Burkhard Rost CUBIC Columbia University

Similar presentations


Presentation on theme: "Burkhard Rost (Columbia New York) Some gory details of protein secondary structure prediction Burkhard Rost CUBIC Columbia University"— Presentation transcript:

1 Burkhard Rost (Columbia New York) Some gory details of protein secondary structure prediction Burkhard Rost CUBIC Columbia University rost@columbia.edu http://www.columbia.edu/~rost http://cubic.bioc.columbia.edu/

2 Burkhard Rost (Columbia New York) FoRc HoMo 1D ….the art of being humble

3 Burkhard Rost (Columbia New York) Goal of secondary structure prediction

4 Secondary structure predictions of 1. and 2. generation single residues (1. generation) –Chou-Fasman, GOR1957-70/80 50-55% accuracy segments (2. generation) –GORIII1986-92 55-60% accuracy problems –< 100% they said: 65% max – < 40% they said: strand non-local –short segments

5 Burkhard Rost (Columbia New York) Helix formation is local THYROID hormone receptor (2nll)

6 Burkhard Rost (Columbia New York)  -sheet formation is NOT local

7 Burkhard Rost (Columbia New York) SEQ KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPA AYVKKLD OBS EEEE E E E EEEEEE EEEEEE EEEEEEHHHEEEE TYP EHHHH EE EEEE EE HHHEE EEEHH Problems of secondary structure predictions (before 1994)

8 Burkhard Rost (Columbia New York) Simple neural network

9 Burkhard Rost (Columbia New York) Training a neural network 1

10 Burkhard Rost (Columbia New York) Errare = (out net - out want) 2 Training a neural network 2

11 Burkhard Rost (Columbia New York) Training a neural network 3

12 Burkhard Rost (Columbia New York) Training a neural network 4

13 Burkhard Rost (Columbia New York) Neural networks classify points

14 Burkhard Rost (Columbia New York) Simple neural network with hidden layer

15 Burkhard Rost (Columbia New York) Neural Network for secondary structure

16 Burkhard Rost (Columbia New York) Secondary structure predictions of 1. and 2. generation single residues (1. generation) –Chou-Fasman, GOR1957-70/80 50-55% accuracy segments (2. generation) –GORIII1986-92 55-60% accuracy problems –< 100% they said: 65% max – < 40% they said: strand non-local –short segments

17 Burkhard Rost (Columbia New York)

18

19

20 normal training balanced training Balanced training

21 Burkhard Rost (Columbia New York)

22 PHDsec: structure-to-structure network

23 Burkhard Rost (Columbia New York) Better prediction of segment lengths

24 Burkhard Rost (Columbia New York) Evolution has it!

25 Burkhard Rost (Columbia New York)

26

27

28

29

30 Spectrin homology domain (SH3)

31 Burkhard Rost (Columbia New York) Prediction accuracy varies!

32 Burkhard Rost (Columbia New York) Why so bad?

33 Burkhard Rost (Columbia New York) Stronger predictions more accurate!

34 Burkhard Rost (Columbia New York) Correct prediction of correctly predicted residues

35 Burkhard Rost (Columbia New York) BAD errors are frequent!

36 Burkhard Rost (Columbia New York) False prediction for engineered proteins!

37 Burkhard Rost (Columbia New York) PHDsec: the un-g(l)ory details average accuracy > 72% (helix, strand, other) 72% is average over distribution: ≈ 10% stronger predictions more accurate WARNING: reliability index almost factor 2 too large for single sequences

38 Burkhard Rost (Columbia New York) Details PHDsec: Multiple alignment single sequences => accuracy clearly lower id nali Q3sec Q2acc AA KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPAAYVKKLD OBS EEEE E E EEEEEE EEEEEE EEEEEEHHHEEEE 30 N 26 70 77 EEEEEEE EEE EEEEE EEEE EE EEE self 1 63 72 EEEEEEE EEEE EEEEE EEEEEE HHHHH

39 Burkhard Rost (Columbia New York) PHDsec: the un-g(l)ory details average accuracy > 72% (helix, strand, other) 72% is average over distribution: ≈ 10% stronger predictions more accurate WARNING: reliability index almost factor 2 too large for single sequences

40 Burkhard Rost (Columbia New York) Details PHDsec: Multiple alignment single sequences => accuracy clearly lower id nali Q3sec Q2acc AA KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPAAYVKKLD OBS EEEE E E EEEEEE EEEEEE EEEEEEHHHEEEE 30 N 26 70 77 EEEEEEE EEE EEEEE EEEE EE EEE self 1 63 72 EEEEEEE EEEE EEEEE EEEEEE HHHHH

41 Burkhard Rost (Columbia New York) Secondary structure prediction Limit of prediction accuracy reached? How complementing other methods? Ultimate rôle in structure prediction (1D-3D)? Better to use "pure" secondary structure prediction methods, or to use 3D methods and read the secondary structure off the 3D model? Conversely, are 3D predictors making optimal use of secondary structure predictions? Will secondary structure and 3D prediction merge completely?

42 Burkhard Rost (Columbia New York) Secondary structure prediction 2000 history 1st generation 50-55% 2nd generation55-62% 3rd generation 199270-72% 2000> 76% what improves? database growth+3 PSI-BLAST+0.5 new training+1 ‘clever method’+1 limit? max88% -> 12% to go 1/5 of proteins with more than 100 proteins -> >80% and from there?

43 Burkhard Rost (Columbia New York) Prediction of protein secondary structure 1980: 55%simple 1990: 60%less simple 1993: 70%evolution 2000: 76%more evolution what is the limit? 88% for proteins of similar structure 80% for 1/5th of proteins with families > 100 missing through: better definition of secondary structure including long-range interactions structural switches chameleon / folding

44 Burkhard Rost (Columbia New York) CAFASP statistics 29 proteins not similar to known PDB –T0086,T0087,T0090,T0091,T0092,T0094,T0095,T0096,T0097,T0098,T0 101,T0102,T0104,T0105,T0106,T0107,T0108,T0109,T0110,T0114,T011 5,T0116,T0117,T0118,T0120,T0124,T0125,T0126,T0127 2 proteins with PSI-BLAST homologue –T0089,T0103 9 proteins with trivial homologue to PDB –T0099,T0100,T0111,T0112,T0113,T0121,T0122,T0123,T0128

45 Burkhard Rost (Columbia New York) CAFASP sec unique

46 Burkhard Rost (Columbia New York) CAFASP sec homologous

47 Burkhard Rost (Columbia New York) CAFASP concept Targets & Non-targets –comparative modelling 85% > all current methods Never compare methods on different proteins Never rank when too few proteins (Never show numbers for one protein between different proteins)

48 Burkhard Rost (Columbia New York) What is significant

49 Burkhard Rost (Columbia New York) Rank only if significant e.g. M1 = 75, M2 = 73 say 16 proteins rule-of-thumb: significant sigma / sqrt(Number of porteins) -> 10/4 = 2.5 -> M1 and M2 cannot be distinguished

50 Burkhard Rost (Columbia New York) EVA: automatic continuous EVAluation of structure prediction

51 Burkhard Rost (Columbia New York) EVA: automatic continuous EVAluation of structure prediction statistics: 31 weeks -> 1549 new structures 352 new sequence unique chains (of 2200) categories: –secondary structure prediction (7 methods) –comparative modelling (4) –fold recognition (7) –contact prediction (4)

52 Burkhard Rost (Columbia New York) EVA: secondary structure MAJOR lessons from EVA: –no point comparing apples and oranges –no point comparing < 20 apples EVA team: –CUBIC, Columbia: Volker Eyrich, Dariusz Przybylski, Burkhard Rost –Rockefeller: Marc Marti-Renom, Andras Fiser, Andrej Sali –Madrid: Florencio Pazos, Alfonso Valencia URL: http://cubic.bioc.columbia.edu/eva/ http://pipe.rockefeller.edu/~eva/ http://montblanc.cnb.uam.es/eva/

53 Burkhard Rost (Columbia New York) EVA: secondary structure 76%

54 Burkhard Rost (Columbia New York) Accuracy varies for proteins!

55 Burkhard Rost (Columbia New York) Averaging over many methods not always a good idea!

56 Burkhard Rost (Columbia New York) Some proteins predicted better

57 Burkhard Rost (Columbia New York) Reliability correlates with accuracy!

58 Burkhard Rost (Columbia New York) Conclusion big gain through using evolutionary information are we going to reach above 80%? How high? continuous secondary structure better methods other features use secondary structure: ASP Young M, Kirshenbaum K, Dill KA, Highsmith S: Predicting conformational switches in proteins. Protein Sci 1999, 8:1752-1764.

59 Burkhard Rost (Columbia New York) Availability of methods email: PredictProtein@columbia.edu –subject:HELP –file: WWW: http://cubic.bioc.columbia.edu/ predictprotein/ META: http://cubic.bioc.columbia.edu/ predictprotein/submit_meta.html EVA:http://cubic.bioc.columbia.edu/eva CUBIC: http://cubic.bioc.columbia.edu/ Email address options # protein name SEQWENCE


Download ppt "Burkhard Rost (Columbia New York) Some gory details of protein secondary structure prediction Burkhard Rost CUBIC Columbia University"

Similar presentations


Ads by Google