Presentation is loading. Please wait.

Presentation is loading. Please wait.

Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

Similar presentations


Presentation on theme: "Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,"— Presentation transcript:

1 Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics, Bourne/Weissig, chapter 12 and 13

2 By Michael Schroeder, Biotec, 2 Folding nProteins are linear polymer mainchains with different amino acid side chains nProteins fold spontaneously reaching a state of minimal energy nSide and main chains interact with one another and with solvent nExample movieExample movie Jones, D.T. (1997) Successful ab initio prediction of the tertiary structure of NK- Lysin using multiple sequences and recognized supersecondary structural motifs. PROTEINS. Suppl. 1, 185-191

3 By Michael Schroeder, Biotec, 3 Examining Proteins nSpecialised tools with different views of structure nCorey, Pauling, Koltun (CPK) nDiameter of sphere ~ atomic radius nHydrogen white, carbon grey, nitrogen blue, oxygen red, sulphur yellow nCartoon nWire nBalls

4 By Michael Schroeder, Biotec, 4 Examining Proteins

5 By Michael Schroeder, Biotec, 5 Protein Folding    Residue  Image taken from www.expasy.org/swissmod/course nConformation of residue nRotation around N-C a bond,  (phi) nRotation around C a -C bond,  (psi) nRotation around peptide bond  (omega) nPeptide bond tends to be nplanar and nin one of two states: ntrans  180  (usually) and ncis,  0  (rarely, and mostly proline)

6 By Michael Schroeder, Biotec, 6 Sasisekharan-Ramakrishnan- Ramachandran plot nSolid line = energetically preferred nOutside dotted line = disallowed nMost amino acids fall into  R region (right-handed alpha helix) or  -region (beta-strand) nGlycine has additional conformations (e.g. left- handed alpha helix =  L region) and in lower right panel Image taken from www.expasy.org/swissmod/course

7 By Michael Schroeder, Biotec, 7 Ramachandran plot Plot for a protein with mostly beta-sheets Example for conformations Image taken from www.expasy.org/swissmod/course

8 By Michael Schroeder, Biotec, 8 Helices and Strands nConsecutive residues in alpha or beta conformation generate alpha-helices and beta- strands, respectively nSuch secondary structure elements are stabilised by weak hydrogen bonds nThey are by turns or loops, regions in which the chain alters direction nTurns are often surface exposed and tend to contain charged or polar residues

9 By Michael Schroeder, Biotec, 9 Alpha Helix nResidue j is hydrogen-bonded to residue j+4 n3.6 residues per turn n1.5A rise per turn nRepeat every 3.6*1.5A = 5.4 A n  = -60 ,  = -45  Image taken from www.expasy.org/swissmod/course

10 By Michael Schroeder, Biotec, 10 Beta strand Image taken from www.expasy.org/swissmod/course

11 By Michael Schroeder, Biotec, 11 Beta Sheets Image taken from www.expasy.org/swissmod/course

12 By Michael Schroeder, Biotec, 12 Turn nResidue j is bonded to residue j+3 nOften proline and glycine Image taken from www.expasy.org/swissmod/course

13 By Michael Schroeder, Biotec, 13 How to Fold a Structure nAll residues must have stereochemically allowed conformations nBuried polar atoms must be hydrogen-bonded nIf a few are missed, it might be energetically preferable to bond these to solvent nEnough hydrophobic surface must be buried and interior must be sufficiently densely packed nThere is evidence, that folding occurs hierarchically: First secondary structure elements, then super- secondary,… nThis justifies hierarchic approach when simulating folding

14 By Michael Schroeder, Biotec, 14 Structure Alignment + Slides from Hanekamp, University of Wyoming, www.uwyo.edu

15 By Michael Schroeder, Biotec, 15 Structure Alignment +

16 By Michael Schroeder, Biotec, 16 Structure Alignment nIn the same way that we align sequences, we wish to align structure nLet’s start simple: How to score an alignment nSequences: E.g. percentage of matching residues nStructure: rmsd (root mean square deviation)

17 By Michael Schroeder, Biotec, 17 Root Mean Square Deviation nWhat is the distance between two points a with coordinates x a and y a and b with coordinates x b and y b ? nEuclidean distance: d(a,b) = √ (x a- -x b ) 2 + (y a -y b ) 2 + (z a -z b ) 2 a b

18 By Michael Schroeder, Biotec, 18 Root Mean Square Deviation nIn a structure alignment the score measures how far the aligned atoms are from each other on average nGiven the distances d i between n aligned atoms, the root mean square deviation is defined as rmsd = √ 1/n ∑ d i 2

19 By Michael Schroeder, Biotec, 19 Quality of Alignment and Example nUnit of RMSD => e.g. Ångstroms nIdentical structures => RMSD = “0” nSimilar structures => RMSD is small (1 – 3 Å) nDistant structures => RMSD > 3 Å nStructural superposition of gamma-chymotrypsin and Staphylococcus aureus epidermolytic toxin A

20 By Michael Schroeder, Biotec, 20 Pitfalls of RMSD nall atoms are treated equally (e.g. residues on the surface have a higher degree of freedom than those in the core) nbest alignment does not always mean minimal RMSD nsignificance of RMSD is size dependent From www.uwyo.edu/molecbio/LectureNotes/ MOLB5650

21 By Michael Schroeder, Biotec, 21 Alternative RSMDs naRMSD = best root-mean-square deviation calculated over all aligned alpha-carbon atoms nbRMSD = the RMSD over the highest scoring residue pairs nwRMSD = weighted RMSD Source: W. Taylor(1999), Protein Science, 8: 654-665. http://www.prosci.uci.edu/Articles/Vol8/issue3/8272/8272.html#relat From www.uwyo.edu/molecbio/LectureNotes/ MOLB5650

22 By Michael Schroeder, Biotec, 22 Computing Structural Alignments nDALI (Distance-matrix-ALIgnment) is one of the first tools for structural alignment nHow does it work? nAtoms: nGiven two structures’ atomic coordinates nCompute two distance matrices: nCompute for each structure all pairwise inter-atom distances. nThis step is done as the computed distances are independent of a coordinate system nThe two original atomic coordinate sets cannot be compared, the two distance matrices can nAlign two distance matrices: nFind small (e.g. 6x6) sub-matrices along diagonal that match nExtend these matches to form overall alignment nThis method is a bit similar to how BLAST works. nSSAP (double dynamic programming) in term 3.

23 By Michael Schroeder, Biotec, 23 DALI Example nThe regions of common fold, as determined by the program DALI by L. Holm and C. Sander, in the TIM-barrel proteins mouse adenosine deaminase [1fkx] (black) and Pseudomonas diminuta phosphotriesterase [1pta] (red):

24 By Michael Schroeder, Biotec, 24 Protein zinc finger (4znf) Slides from Hanekamp, University of Wyoming, www.uwyo.edu

25 By Michael Schroeder, Biotec, 25 Superimposed 3znf and 4znf 30 CA atoms RMS = 0.70Å 248 atoms RMS = 1.42Å Slides from Hanekamp, University of Wyoming, www.uwyo.edu Lys30

26 By Michael Schroeder, Biotec, 26 Superimposed 3znf and 4znf backbones 30 CA atoms RMS = 0.70Å Slides from Hanekamp, University of Wyoming, www.uwyo.edu

27 By Michael Schroeder, Biotec, 27 RMSD vs. Sequence Similarity nAt low sequence identity, good structural alignments possible Picture from www.jenner.ac.uk/YBF/DanielleTalbot.ppt

28 By Michael Schroeder, Biotec, 28 Structure Classification

29 By Michael Schroeder, Biotec, 29 Why classify structures? nStructure similarity is good indicator for homology, therefore classify structures nClassification at different levels nSimilar general folding patterns (structures not necessarily related) nPossibly low sequence similarity, but similar structure and function implies very likely homology nHigh sequence similarity implies similar structures and homology nClassification can be used to investigate evolutionary relationships and possibly infer function

30 By Michael Schroeder, Biotec, 30 Structure Classification nSCOP: Structural Classification of Proteins nHand curated (Alexei Murzin, Cambridge) with some automation nCATH: Class, Architecture, Topology, Homology nAutomated, where possible, some checks by hand nFSSP: Fold classification based on Structure- Structure alignment of Proteins nFully automated nReasonable correspondance (>80%)

31 By Michael Schroeder, Biotec, 31 Evolutionary Relation nStrong sequence similarity is assumed to be sufficient to infer homology nClose structural and functional similarity together are also considered sufficient to infer homology nSimilar structure alone not sufficient, as proteins may have converged on structure due to physiochemical necessity nSimilar function alone not sufficient, as proteins may have developed it due to functional selection nIn general, structure is more conserved than sequence nBeware: Descendents of ancestor may have different function, structure, and sequence! Difficult to detect

32 By Michael Schroeder, Biotec, 32 What is a domain? Single and Multi-Domain Proteins

33 By Michael Schroeder, Biotec, 33 What is a domain? nFunctional: Domain is “independent” functional unit, which occurs in more than one protein nPhysiochemical: Domain has a hydrophobic core nTopological: Intra-domain distances of atoms are minimal, Inter-domain distances maximal nDifficult to exactly define domain nDifficult to agree on exact domain border

34 By Michael Schroeder, Biotec, 34 Domains re-occur nA domain re-occurs in different structures and possibly in the context of different other domains nP-loop domain in n1goj: Structure Of A Fast Kinesin: Implications For ATPase Mechanism and Interactions With Microtubules Motor Protein (single domain) n1ii6: Crystal Structure Of The Mitotic Kinesin Eg5 In Complex With Mg-ADP Cell Cycle (two domains)

35 By Michael Schroeder, Biotec, 35 Domains re-occur 1in5: interaction of P-loop domain (green & orange) and winged helix DNA binding domain 1a5t: interaction of P-loop domain (green & orange) and DNA polymerase III domain

36 By Michael Schroeder, Biotec, 36 Domains have hydrophobic core nKyte J., Doolittle R.F, J. Mol. Biol. 157:105- 132(1982). Ala: 1.800 Arg: -4.500 Asn: -3.500 Asp: -3.500 Cys: 2.500 Gln: -3.500 Glu: -3.500 Gly: -0.400 His: -3.200 Ile: 4.500 Leu: 3.800 Lys: -3.900 Met: 1.900 Phe: 2.800 Pro: -1.600 Ser: -0.800 Thr: -0.700 Trp: -0.900 Tyr: -1.300 Val: 4.200

37 By Michael Schroeder, Biotec, 37 Intra-domain distances minimal nDistances between atoms within domain are minimal nDistances between atoms of two different domains are maximal

38 By Michael Schroeder, Biotec, 38 PDB, Proteins, and Domains nCa. 20.000 structures in PDB n50% single domain n50% multiple domain n90% have less than 5 domains Dom#Freq. 18464 24358 3926 41888 5148 6624 742 8491 922 1058 … … 307 311 3216 361 408 421 483 491

39 By Michael Schroeder, Biotec, 39 A structure with 49 domains n1AON, Asymmetric Chaperonin Complex Groel/Groes/(ADP)7

40 By Michael Schroeder, Biotec, 40 SCOP: Structural Classification of Proteins FOLD CLASS top SUPERFAMILY FAMILY C1 set domains (antibody constant) V set domains (antibody variable) All alpha (218) All Beta (144) Alpha/Beta (136) Alpha+Beta (279)Trypsin-like serine proteases (1) Immunoglobulin-like (23) Transglutaminase (1) Immunoglobulin (6)

41 By Michael Schroeder, Biotec, 41 Class nAll alpha n(possibly small beta adornments) nAll beta n(possibly small alpha adornments)

42 By Michael Schroeder, Biotec, 42 Class nAlpha/beta (alpha and beta) = single beta sheet with alpha helices joining C-terminus of one strand to the N-terminus of the next nsubclass: beta sheet forming barrel surrounded by alpha helices nsublass: central planar beta sheet nAlpha+beta (alpha plus beta) = Alpha and beta units are largely separated nStrands joined by hairpins leading to antiparallel sheets

43 By Michael Schroeder, Biotec, 43 Class nMulti-domain proteins nhave domains placed in different classes ndomains have not been observed elsewhere nE.g. 1hle

44 By Michael Schroeder, Biotec, 44 Class nMembrane (few and most unique) and cell surface proteins nE.g. Aquaporin 1ih5

45 By Michael Schroeder, Biotec, 45 Class nSmall Proteins nE.g. Insulin, 1pid

46 By Michael Schroeder, Biotec, 46 Class nCoiled coil proteins nE.g. 1i4d, Arfaptin-Rac binding fragment

47 By Michael Schroeder, Biotec, 47 Class nLow-resolution structures, peptides, designed proteins nE.g. 1cis, a designed protein, hybrid protein between chymotrypsin inhibitor CI-2 and helix E from subtilisin Carlsberg from Barley (Hordeum vulgare), hiproly strain

48 By Michael Schroeder, Biotec, 48 Fold, Superfamily, Family nFold nCommon core structure ni.e. same secondary structure elements in the same arrangement with the same topological structure nSuperfamily nVery similar structure and function nFamily nSequence identity (>30%) or extremely similar structure and function

49 By Michael Schroeder, Biotec, 49 Distribution (2007) ClassFoldSuperfamilyFamily All alpha259459772 All beta165331679 Alpha/beta141232736 Alpha+beta334488897 Multidomain53 74 Membrane and cell surface 5092104 Small proteins85122202 Total108617773464

50 By Michael Schroeder, Biotec, 50 Uses of SCOP nAutomatic classification nUnderstanding of protein enzymatic function nUse superfamily and fold to study distantly related proteins nStudy sequence and structure variability nDerive substitution matrices for sequence comparison nExtract structural principles for design nStudy decomposition of multi domain proteins nEstimate total number of folds nDerived databases

51 By Michael Schroeder, Biotec, 51 PDB, Proteins, Domains revisited n80% of PDB have only one type of SCOP superfamily n15% of PDB have two different SCOP superfamilies sfNo sfNoFreq 113960 22721 3495 4178 533 625 71 94 209 211 221 236

52 By Michael Schroeder, Biotec, 52 A structure with 23 different superfamilies n1k9m Co Crystal Structure Of Tylosin Bound To The 50S Ribosomal Subunit Of Haloarcula Marismortui Ribosome

53 By Michael Schroeder, Biotec, 53 The 20 Most Frequently Occurring Superfamilies SuyperfamilySCOP ID#PDB Immunoglobulinb.1.1823 Lysozyme-liked.2.1777 Trypsin-like serine proteasesb.47.1649 P-loop containing nucleotide triphosphate hydrolasesc.37.1521 NAD(P)-binding Rossmann-fold domainsc.2.1384 Globin-likea.1.1384 (Trans)glycosidasesc.1.8332 Acid proteasesb.50.1288 Concanavalin A-like lectins/glucanasesb.29.1230 Thioredoxin-likec.47.1217 EF-handa.39.1212 alpha/beta-Hydrolasesc.69.1195 Cupredoxinsb.6.1178 Ribonuclease H-likec.55.3178 PLP-dependent transferasesc.67.1176 Periplasmic binding protein-like IIc.94.1171 Carbonic anhydraseb.74.1169 Metalloproteases (\zincins\"), catalytic domain"d.92.1169 FAD/NAD(P)-binding domainc.3.1162 Cytochrome ca.3.1161

54 By Michael Schroeder, Biotec, 54 CATH nClass nsecondary structure composition nArchitecture norientation in 3D nTopology nconnectivity nHomology nGrouped by evidence for homology (sequence, structure and function)

55 By Michael Schroeder, Biotec, 55 Generating CATH n1. Identify close relatives by pairwise sequence alignment n2. Detect more distant relatives using n2a. sequence profiles and n2b. structure alignment n3. Structures still unclassified after 1. and 2. are examined by hand to detect domain boundaries n4. Try 2. and 3. again n5. If still unclassified assign manually

56 By Michael Schroeder, Biotec, 56 CATH step 1: Sequence-based Identification of Homologues Structures n> 30% sequence similarity implies similar structure nRelatives identified using pairwise alignment are clustered using hierarchical clustering with single linkage nReminder…

57 By Michael Schroeder, Biotec, 57 Hierarchical Clustering (1,2)3(4,5) (1,2)058 304 (4,5)0 12345 1026109 20598 3045 403 50 (1,2)345 0598 3045 403 50 (3,(4,5)) (1,2)05 (3,(4,5))0 5 4 3 2 1 0 12345

58 By Michael Schroeder, Biotec, 58 Hierarchical Clustering: nHow to define distance between clusters? nSingle linkage: nMinimum nExample: Distance (A,B) to C is 1 nComplete linkage: nMaximum nExample: Distance (A,B) is C is 2 nAverage linkage: nAverage nExample: Distance (A,B) to C is 1.5 nAre dendrograms always the same independent of the linkage method? 0C 10B 210A CBA ABC ABC

59 By Michael Schroeder, Biotec, 59 Hierarchical Clustering: Chaining nBeware of chaining when using single linkage nAs nearest neighbour selected, it appears that all members of the cluster are very similar to each other, when in fact A and Z are very different ABCD…Z A0123…25 B012…24 C01…23 D0…22 …… Z0 A B C D … Z

60 By Michael Schroeder, Biotec, 60 CATH and single linkage nIt is argued that nstructural data is quite sparse, nhence it cannot be expected that all cluster members will be very similar (in terms of sequence) to each other, nso that the chaining effect is even useful

61 By Michael Schroeder, Biotec, 61 CATH step 2a: nProfile-based methods such as PSI-BLAST are used to detect distant relatives nBuild profiles using all sequence data available (rather than only sequences for which structure exists) nThis increases quality of profiles dramatically n51% distant relatives retrieved using profiles based on sequences with known structure only n82% distant relatives retrieved using profile based on all sequences

62 By Michael Schroeder, Biotec, 62 CATH step 2b: Structure-based methods to detect distant relatives nFor ca. 15% of structures, sequence-based method does not work nExample: For globins sequence similarity can fall below 10%, yet structure and function (oxygen- binding) are preserved nUse SSAP, the Sequential Structure Alignment Program

63 By Michael Schroeder, Biotec, 63 Clustering Result of Structure Alignment nRelatives identified using pairwise alignment are clustered using hierarchical clustering with single linkage

64 By Michael Schroeder, Biotec, 64 Improving Efficiency: GRATH nScreening large structures (>300 residues) against database can take days nIdea of GRATH (Graphical Representation of CATH): nImprove efficiency by filtering at a higher level before doing detailed comparison nRepresent protein as graph where nNodes are secondary structure elements represented as their midpoint, tilt, and rotation nEdges distances between midpoints of secondary structure elements nUse algorithm to determine subgraph isomorphism (i.e. does one graph occur in another one) nYes, then do detailed comparison using SSAP

65 By Michael Schroeder, Biotec, 65 Structure Prediction and Modelling

66 By Michael Schroeder, Biotec, 66 Structure Prediction: Four Main Problem Areas nGiven a sequence with unknown structure, predict its structure nSecondary structure prediction nPredict regions of helices and strands nHomology modelling nPredict structure from known structures of one or more related proteins nFold recognition nGiven a library of structures, determine which one (if any) is the fold of the given sequence nPrediction of novel folds: A-priori and knowledge-based methods

67 By Michael Schroeder, Biotec, 67 Structure Prediction of Novel Folds: Two Approaches nA priori: nMost approaches aim to reproduce inter-atomic interactions by ndefining an energy function and ntrying to find global minimum nProblem: nInadequacy of the energy function nAlgorithms get stuck in local minima nKnolwedge-based: nFind similarities to known structures or sub- structures

68 By Michael Schroeder, Biotec, 68 Secondary Structure Prediction nA successful tool for secondary structure prediction is PROF nPROF uses a neural networks to learn secondary structure from known structures n¾ of PROF’s prediction are correct nAt CASP 2000 it predicted e.g. the following |10 |20 |30 |40 |50 Sequence ALVEDPPLKVSEGGLIREGYDPDLDALRAAHREGVAYFLELEERERERTG Prediction HH------------EEE------HHHHHHHHHH-HHHHHHHHHHHHHHH- Experiment -E-------------E-----HHHHHHHHHHHHHHHHHHHHHHHHHHHH- |60 |70 |80 | 90 |100 IPTLKVGYNAVFGYYLEVTRPYYERVPKEYRPVQTLKDRQRYTLPEMKEK --EEEEEEEEEEEEEEEE-----------EEEEEEEE—-EEEE-HHHHHH ----EEEEE---EEEEEEEHHHHHH-----EEEEE---EEEEE-HHHHHH |110 |120 EREVYRLEALIRRREEEVFLEVRERAKRQ HHHHHHHHHHHHHHHHHHHHHHHHHHHH- HHHHHHHHHHHHHHHHHHHHHHHHHHH--

69 By Michael Schroeder, Biotec, 69 PROF’s prediction nThe regions predicted by the PROF server of Rost to be helical are shown as wider ribbons. The prediction missed only a short helix, at the top left of the picture

70 By Michael Schroeder, Biotec, 70 Homology modelling nDefine the model of an unknown structure by making minimal changes to a relative with known structure nAlign amino acid sequences of target and one or more known structures nInsertions and deletions should be in loop regions nDetermine mainchain segments to represent the regions containing insertions and deletions and stitch these into the known structure nReplace the sidechains of the residues that have been mutated nExamine the model (by hand and computationally) to detect collisions between atoms nRefine the model by limited energy minimisation

71 By Michael Schroeder, Biotec, 71 Accuracy of Homology Modelling nWorks for >40-50% sequence similarity nExample: SWISS-MODEL Prediction of neurotoxin of red scorpion (1DQ7) from neurotoxin of yellow scorpion (1PTX)

72 By Michael Schroeder, Biotec, 72 Fold Recognition: 3D Profiles nGiven a sequence determine which (if any) fold is most similar nCan we build profiles to represent structures of similar fold (similar to sequence profiles)? n3D profiles: nClassify the environment of each residue nSecondary structure: nIs it part of helix, sheet or other (determined by Mainchain hydrogen bonding interactions) nSurface exposure: n 114A 2 accessible surface area nPolar or non-polar nature of environment nTotal of 18 residue classes, one of which each residue is part of nSequence of these residue classes is 3D profile

73 By Michael Schroeder, Biotec, 73 3D Profiles and Alignments nStructure-Structure Alignment: n3D profiles of two known structures can be aligned against each other nSequence-Structure Alignment: nBased on existing 3D profiles, probability can be determined for a residue occurring in a residue class. nUsing this probability, we can assign 3D profile to a sequence nAnd hence align the sequence 3D profile to a structure 3D profile nFor correctly determined protein structures, the structure 3D profile fits the sequence 3D profile well nHowever, other proteins may score even better nIf a structure does not match its own 3D profile well it is likely that there is an error in the structure determination

74 By Michael Schroeder, Biotec, 74 Threading nPull query sequence through known structure and rate the score nNecessary: nMethod to score the models to select best one nMethod to calibrate the scores to decide which of the best is correct Homology modelling Threading Identify homologues Try all possible parents Determine optimal alignment Try many alignments Optimize one model Evaluate many rough models

75 By Michael Schroeder, Biotec, 75 Scoring for Threading nEmpirical patterns of residue neighbours derived from known structures nObserve distribution of inter-residue distances for all 20 x 20 residue pairs nDerive probability distribution as function of distance in space and on sequence nBoltzmann equation relates probability and energy nReverse this and derive energy function from probability distribution

76 By Michael Schroeder, Biotec, 76 Threading the sequence template Target Slides from Hanekamp, University of Wyoming, www.uwyo.edu

77 By Michael Schroeder, Biotec, 77 “Threaded” sequence Yellow = adrenergic receptor sequence Blue = adrenergic receptor (PDB 1F88 ) Slides from Hanekamp, University of Wyoming, www.uwyo.edu

78 By Michael Schroeder, Biotec, 78 Modeled structure Gaps Slides from Hanekamp, University of Wyoming, www.uwyo.edu

79 By Michael Schroeder, Biotec, 79 Corrected Model Slides from Hanekamp, University of Wyoming, www.uwyo.edu

80 By Michael Schroeder, Biotec, 80 Ab initio Structure Prediction

81 By Michael Schroeder, Biotec, 81 Molecular dynamics nStructure prediction = place atoms so that interactions between them create a unique state of maximum stability nProblem: nModel of inter-atomic distances is not complete nComputational scale: nLarge number of variables and massive search space nNon-linearities nRough energy surface with many local minima

82 By Michael Schroeder, Biotec, 82 Conformational energy calculations nBond stretching: nBond angle bend nTorsion angle (e.g. , ,  ) nVan der Waals interactions nShort-range repulsion ~R -12 and long-range attraction ~R -6, where R is the inter-atom distance nHydrogen bond nWeak chemical/electrostatic interaction, ~R -12 and ~R -10 nElectrostatics nCharges on atoms nSolvent nInteractions with water, salt, sugar, etc.

83 By Michael Schroeder, Biotec, 83 Rosetta nPredicts structure by first generating structures of fragments using known structures (3-9 residues) nCombine fragments using Monte Carlo simulation using an energy function with terms for nPaired beta-sheets nBurial of hydrophobic residues nCarries out 1000 simulations nResults are clustered and the centre of the largest cluster is presented as prediction nDemoDemo

84 By Michael Schroeder, Biotec, 84 ROSETTA nThe program ROSETTA, by D. Baker and colleagues, can predict the structures of proteins for which no complete domain of similar folding pattern appears in the database. Prediction by ROSETTA of H. influenzae, hypothetical protein. Black lines, experimental structure; red lines, prediction

85 By Michael Schroeder, Biotec, 85 Rosetta nPrediction by ROSETTA of The N-terminal half of domain 1 of human DNA repair protein Xrcc4. This figures shows a selected substructure of Xrcc4 containing the N-terminal 55 out of 116 residues. Black lines, experimental structure; red lines, prediction

86 By Michael Schroeder, Biotec, 86 LINUS nAnother programme with similar idea nPrediction by LINUS (program by G.D. Rose and R. Srinivasan) of C- terminal domain of rat endoplasmic reticulum protein ERp29. Black lines, experimental structure; red lines, prediction

87 By Michael Schroeder, Biotec, 87 Monte Carlo Simulation nObjective: Find conformation with minimal energy nProblem: Avoid local minima nAlgorithm: n1. Generate a random initial conformation x n2. Perturb conformation x to generate a neighbouring conformation x’ n3. Calculate the energies E(x) and E(x’), resp., for conformations x and x’ n4. If E(x)>E(x’) (i.e. x’ is an improvement, we go down hill from x to x’) then accept x’ as new conformation and go to 2. n5. If E(x)<E(x’) (i.e. x’ is no improvement, we go uphill from x to x’) then accept x’ as new conformation with probability p n6. The probability p to accept uphill moves is reduced with every step n7. Go to step 2. nStep 1.-4. make sure that we “walk” downhill towards a minimum nStep 5.-7. make sure that if we are in local minimum there is a chance to get out of it by accepting an uphill move. It’s important that this probability decreases so that we are getting more and more unlikely to walk uphill

88 By Michael Schroeder, Biotec, 88 Summary nYou should know now nWhat helices, strands, sheets are nWhat a Ramachandran plot is nHow to score a structural alignment (rmsd) nHow to compute a structural alignment nHow a domain can be characterised nWhy structure classification is useful nWhat the main structure classes are nHow classifications can be generated automatically nWhat the problems are nWhat secondary structure prediction, homology modelling, threading, ab-initio and knowledge-based structure prediction of novel folds are nVisit PDB, SCOP and CATH websites and nRead chapter 5


Download ppt "Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,"

Similar presentations


Ads by Google