Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prediction of protein structure

Similar presentations


Presentation on theme: "Prediction of protein structure"— Presentation transcript:

1 Prediction of protein structure

2 aim Structure prediction tries to build models of 3D structures of proteins that could be useful for understanding structure-function relationships.

3 Genbank/EMBL Uniprot PDB

4

5 DNA sequence Protein sequence Molecular recognition 3D structure

6 The protein folding problem
The information for 3D structures is coded in the protein sequence Proteins fold in their native structure in seconds Native structures are both thermodynamically stables and kinetically available

7 ab-initio prediction Prediction from sequence using first principles
AVVTW...GTTWVR

8 Ab-initio prediction “In theory”, we should be able to build native structures from first principles using sequence information and molecular dynamics simulations: “Ab-initio prediction of structure” Simulaciones de 1 ms de “folding” de una proteína modelo (Duan-Kollman: Science, 277, 1793, 1998). Simulaciones de folding reversible de péptidos ( ns) (Daura et al., Angew. Chem., 38, 236, 1999). Simulaciones distribuidas de folding de Villin (36-residues) (Zagrovic et al., JMB, 323, 927, 2002).

9 ... the bad news ... It is not possible to span simulations to the “seconds” range Simulations are limited to small systems and fast folding/unfolding events in known structures steered dynamics biased molecular dynamics Simplified systems

10 typical shortcuts Reduce conformational space
1,2 atoms per residue fixed lattices Statistic force-fields obtained from known structures Average distances between residues Interactions Use building blocks: 3-9 residues from PDB structures

11 “lattice” folding

12 Example PROSA potential
Total Hydrophobic Cb-Cb Very stable Low stability

13 Results from ab-initio
Average error 5 Å - 10 Å Function cannot be predicted Long simulations Some protein from E.coli predicted at 7.6 Å (CASP3, H.Scheraga)

14 comparative modelling
The most efficient way to predict protein structure is to compare with known 3D structures

15 Protein folds

16 Basic concept In a given protein 3D structure is a more conserved characteristic than sequence Some aminoacids are “equivalent” to each other Evolutionary pressure allows only aminoacids substitutions that keep 3D structure largely unaltered Two proteins of “similar” sequences must have the “same” 3D structure

17 Possible scenarios 1. Homology can be recognized using sequence comparison tools or protein family databases (blast, clustal, pfam,...). Structural and functional predictions are feasible 2. Homology exist but cannot be recognized easily (psi-blast, threading) Low resolution fold predictions are possible. No functional information. 3. No homology 1D predictions. Sequence motifs. Limited functional prediction. Ab-initio prediction

18 fold prediction

19 3D struc. prediction

20 1D prediction Prediction is based on averaging aminoacid properties
AGGCFHIKLAAGIHLLVILVVKLGFSTRDEEASS Average over a window

21

22 1D prediction. Properties
Secondary structure propensitites Hydrophobicity (transmembrane) Accesibility ...

23 Propensities Chou-Fasman
Biochemistry 17, a b turn

24 Some programs (www.expasy.org)
BCM PSSP - Baylor College of Medicine Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction GOR I (Garnier et al, 1978) [At PBIL or at SBDS] GOR II (Gibrat et al, 1987) GOR IV (Garnier et al, 1996) HNN - Hierarchical Neural Network method (Guermeur, 1997) Jpred - A consensus method for protein secondary structure prediction at University of Dundee nnPredict - University of California at San Francisco (UCSF) PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader, MaxHom, EvalSec from Columbia University PSA - BioMolecular Engineering Research Center (BMERC) / Boston PSIpred - Various protein structure prediction methods at Brunel University SOPM (Geourjon and Deléage, 1994) SOPMA (Geourjon and Deléage, 1995) AGADIR - An algorithm to predict the helical content of peptides

25 1D Prediction Original methods: 1 sequence and uniform parameters (25-30%) Original improvements: Parameters specific from protein classes Present methods use sequence profiles obtained from multiple alignments and neural networks to extract parameters (70-75%, 98% for transmembrane helix)

26

27

28 PredictProtein (PHD) Building of a multiple alignment using Swissprot, prosite, and domain databases 1D prediction from the generated profile using neural networks Fold recognition Confidence evaluation

29 PredictProtein Available information
Multiple alignments MaxHom PROSITE motifs SEG Composition-bias Threading TOPITS Secondary structure PHDSec PROFsec Transmembrane helices PHDhtm, PHDtop Globularity GLOBE Coiled-coil COILS Disulfide bridges CYSPRED Result

30 PredictProtein Available information
Signal peptides SignalP O-glycosilation NetOglyc Chloroplast import signal CloroP Consensus secondary struc. JPRED Transmembrane TMHMM, TOPPRED SwissModel

31 Methods for remote homology
Homology can be recognized using PSI-Blast Fold prediction is possible using threading methods Acurate 3D prediction is not possible: No structure-function relationship can be inferred from models

32 Threading Unknown sequence is “folded” in a number of known structures
Scoring functions evaluate the fitting between sequence and structure according to statistical functions and sequence comparison

33 ATTWV....PRKSCT SELECTED HIT 10.5 > 5.2

34 ATTWV. PRKSCT. Sequence HHHHH. CCBBBB. Pred. Sec. Struc. eeebb. eeebeb
ATTWV....PRKSCT Sequence HHHHH....CCBBBB Pred. Sec. Struc. eeebb....eeebeb Pred. accesibility Sequence GGTV....ATTW ATTVL....FFRK Obs SS BBBB....CCHH HHHB.....CBCB Obs Acc. EEBE.....BBEB BBEBB....EBBE

35 Threading accurancy

36

37

38 Comparative modelling
Good for homology >30% Accurancy is very high for homology > 60% Remainder The model must be USEFUL Only the “interesting” regions of the protein need to be modelled

39 Expected accurancy Strongly dependent on the quality of the sequence alignment Strongly dependent on the identity with “template” structures. Very good structures if identity > 60-70%. Quality of the model is better in the backbone than side chains Quality of the model is better in conserved regions

40

41 Steps Choose templates: Proteins with experimental 3D structure with significant homology (BLAST, PFAM, PDB) Building multiple alignment of templates. Alignment quality is critical for accurancy. Always use structure-based alignment. Reduce redundancies

42 Template alignment

43 Steps Alignment of template structures
Alignment of unknown sequence against template alignment Structural alignment may not concide with evolution-based alignment. Gaps must be chosen to minimize structure distortion

44 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS (green)
PHE ASN VAL CYS ARG THR PRO GLU ALA ILE CYS (red) PHE ASN VAL CYS ARG THR PRO GLU ALA ILE CYS (blue)

45

46

47 Steps Alignment of template structures
Alignment of unknown sequence against template alignment Build structure of conserved regions (SCR) Coordinates come from either a single structure or averages. Side chains are adapted to the original or placed in standard conformations

48

49 Steps Alignment of template structures
Alignment of unknown sequence against template alignment Build structure of conserved regions (SCR) Build of unconserved regions (“loops” usually)

50 “loops” Ab initio PDB

51 Chosen manually or energy-based
“loops” Chosen manually or energy-based

52 Optimization Optimize side chain conformation Optimize everything
Energy minimization restricted to standard conformers and VdW energy Optimize everything Global energy minimization with restrains Molecular dynamics

53

54

55 Quality test No energy differences between a correct or wrong model
The structure must by “chemically correct” to use it in quantitative predictions

56 Alignment quality Global test: compare sequence with N residue exchanges (N=1000). Calculate Z-score If (alignments res): Z > Ideal 5 < Z <= % core residues core right Z <= Problems

57 Analysis software PROCHECK WHATCHECK Suite Biotech PROSA

58 Sources of information
300 best structures in PDB Molecular geometry from CSD database Theoretical data (Ramachandran, etc.)

59 Procheck Covalent geometry Planarity Dihedral angels Quirality
Non-bonded interactions Satisfied/unsatisfies Hydrogen-bonds Disulfide bonds

60

61

62

63 Whatcheck

64 Prediction software SwissModel (automatic) SwissModel Repository
SwissModel Repository 3D-JIGSAW (M.Stenberg) Modeller (A.Sali) MODBASE (A. Sali)

65

66

67 spdbv Result

68 Final test The model must justify experimental data (i.e. differences between unknown sequence and templates) and be useful to understand function.

69

70

71

72

73

74

75

76


Download ppt "Prediction of protein structure"

Similar presentations


Ads by Google