Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

Slides:

Advertisements

Similar presentations

Advertisements

University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning.

Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.

Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.

AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory

Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.

PDAs Accept Context-Free Languages

ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala

Reflection nurulquran.com.

EuroCondens SGB E.

Reinforcement Learning

Slide 1Fig 26-CO, p.795. Slide 2Fig 26-1, p.796 Slide 3Fig 26-2, p.797.

Sequential Logic Design

STATISTICS Linear Statistical Models

Addition and Subtraction Equations

David Burdett May 11, 2004 Package Binding for WS CDL.

1 When you see… Find the zeros You think…. 2 To find the zeros...

Add Governors Discretionary (1G) Grants Chapter 6.

CHAPTER 18 The Ankle and Lower Leg

Summative Math Test Algebra (28%) Geometry (29%)

The 5S numbers game..

突破信息检索壁垒－SciFinder Scholar 介绍

A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.

Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)

The basics for simulations

Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)

Factoring Quadratics — ax² + bx + c Topic

EE, NCKU Tien-Hao Chang (Darby Chang)

MM4A6c: Apply the law of sines and the law of cosines.

MCQ Chapter 07.

Regression with Panel Data

1 Prediction of electrical energy by photovoltaic devices in urban situations By. R.C. Ott July 2011.

Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.

Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.

Copyright © [2002]. Roger L. Costello. All Rights Reserved. 1 XML Schemas Reference Manual Roger L. Costello XML Technologies Course.

Progressive Aerobic Cardiovascular Endurance Run

Biology 2 Plant Kingdom Identification Test Review.

Chapter 1: Expressions, Equations, & Inequalities

2.5 Using Linear Models Month Temp º F 70 º F 75 º F 78 º F.

MaK_Full ahead loaded 1 Alarm Page Directory (F11)

Artificial Intelligence

When you see… Find the zeros You think….

2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.

Before Between After.

2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.

Slide R - 1 Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Prentice Hall Active Learning Lecture Slides For use with Classroom Response.

Subtraction: Adding UP

1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)

Static Equilibrium; Elasticity and Fracture

ANALYTICAL GEOMETRY ONE MARK QUESTIONS PREPARED BY:

Resistência dos Materiais, 5ª ed.

Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.

Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.

WARNING This CD is protected by Copyright Laws. FOR HOME USE ONLY. Unauthorised copying, adaptation, rental, lending, distribution, extraction, charging.

9. Two Functions of Two Random Variables

A Data Warehouse Mining Tool Stephen Turner Chris Frala

1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.

Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.

1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)

Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.

Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.

Protein Tertiary Structure Prediction Structural Bioinformatics.

Secondary structure prediction

Protein Structure BL

Protein structure prediction.

Presentation transcript:

Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics, Bourne/Weissig, chapter 12 and 13

By Michael Schroeder, Biotec, 2 Folding nProteins are linear polymer mainchains with different amino acid side chains nProteins fold spontaneously reaching a state of minimal energy nSide and main chains interact with one another and with solvent nExample movieExample movie Jones, D.T. (1997) Successful ab initio prediction of the tertiary structure of NK- Lysin using multiple sequences and recognized supersecondary structural motifs. PROTEINS. Suppl. 1,

By Michael Schroeder, Biotec, 3 Examining Proteins nSpecialised tools with different views of structure nCorey, Pauling, Koltun (CPK) nDiameter of sphere ~ atomic radius nHydrogen white, carbon grey, nitrogen blue, oxygen red, sulphur yellow nCartoon nWire nBalls

By Michael Schroeder, Biotec, 4 Examining Proteins

By Michael Schroeder, Biotec, 5 Protein Folding    Residue  Image taken from nConformation of residue nRotation around N-C a bond,  (phi) nRotation around C a -C bond,  (psi) nRotation around peptide bond  (omega) nPeptide bond tends to be nplanar and nin one of two states: ntrans  180  (usually) and ncis,  0  (rarely, and mostly proline)

By Michael Schroeder, Biotec, 6 Sasisekharan-Ramakrishnan- Ramachandran plot nSolid line = energetically preferred nOutside dotted line = disallowed nMost amino acids fall into  R region (right-handed alpha helix) or  -region (beta-strand) nGlycine has additional conformations (e.g. left- handed alpha helix =  L region) and in lower right panel Image taken from

By Michael Schroeder, Biotec, 7 Ramachandran plot Plot for a protein with mostly beta-sheets Example for conformations Image taken from

By Michael Schroeder, Biotec, 8 Helices and Strands nConsecutive residues in alpha or beta conformation generate alpha-helices and beta- strands, respectively nSuch secondary structure elements are stabilised by weak hydrogen bonds nThey are by turns or loops, regions in which the chain alters direction nTurns are often surface exposed and tend to contain charged or polar residues

By Michael Schroeder, Biotec, 9 Alpha Helix nResidue j is hydrogen-bonded to residue j+4 n3.6 residues per turn n1.5A rise per turn nRepeat every 3.6*1.5A = 5.4 A n  = -60 ,  = -45  Image taken from

By Michael Schroeder, Biotec, 10 Beta strand Image taken from

By Michael Schroeder, Biotec, 11 Beta Sheets Image taken from

By Michael Schroeder, Biotec, 12 Turn nResidue j is bonded to residue j+3 nOften proline and glycine Image taken from

By Michael Schroeder, Biotec, 13 How to Fold a Structure nAll residues must have stereochemically allowed conformations nBuried polar atoms must be hydrogen-bonded nIf a few are missed, it might be energetically preferable to bond these to solvent nEnough hydrophobic surface must be buried and interior must be sufficiently densely packed nThere is evidence, that folding occurs hierarchically: First secondary structure elements, then supersecondary,… nThis justifies hierarchic approach when simulating folding

By Michael Schroeder, Biotec, 14 Structure Alignment + Slides from Hanekamp, University of Wyoming,

By Michael Schroeder, Biotec, 15 Structure Alignment +

By Michael Schroeder, Biotec, 16 Structure Alignment nIn the same way that we align sequences, we wish to align structure nLet’s start simple: How to score an alignment nSequences: E.g. percentage of matching residues nStructure: rmsd (root mean square deviation)

By Michael Schroeder, Biotec, 17 Root Mean Square Deviation nWhat is the distance between two points a with coordinates x a and y a and b with coordinates x b and y b ? nEuclidean distance: d(a,b) = √ (x a- -x b ) 2 + (y a -y b ) 2 + (z a -z b ) 2 a b

By Michael Schroeder, Biotec, 18 Root Mean Square Deviation nIn a structure alignment the score measures how far the aligned atoms are from each other on average nGiven the distances d i between n aligned atoms, the root mean square deviation is defined as rmsd = √ 1/n ∑ d i 2

By Michael Schroeder, Biotec, 19 Quality of Alignment and Example nUnit of RMSD => e.g. Ångstroms nIdentical structures => RMSD = “0” nSimilar structures => RMSD is small (1 – 3 Å) nDistant structures => RMSD > 3 Å nStructural superposition of gamma-chymotrypsin and Staphylococcus aureus epidermolytic toxin A

By Michael Schroeder, Biotec, 20 Pitfalls of RMSD nall atoms are treated equally (e.g. residues on the surface have a higher degree of freedom than those in the core) nbest alignment does not always mean minimal RMSD nsignificance of RMSD is size dependent From MOLB5650

By Michael Schroeder, Biotec, 21 Alternative RSMDs naRMSD = best root-mean-square deviation calculated over all aligned alpha-carbon atoms nbRMSD = the RMSD over the highest scoring residue pairs nwRMSD = weighted RMSD Source: W. Taylor(1999), Protein Science, 8: From MOLB5650

By Michael Schroeder, Biotec, 22 Computing Structural Alignments nDALI (Distance-matrix-ALIgnment) is one of the first tools for structural alignment nHow does it work? nAtoms: nGiven two structures’ atomic coordinates nCompute two distance matrices: nCompute for each structure all pairwise inter-atom distances. nThis step is done as the computed distances are independent of a coordinate system nThe two original atomic coordinate sets cannot be compared, the two distance matrices can nAlign two distance matrices: nFind small (e.g. 6x6) sub-matrices along diagonal that match nExtend these matches to form overall alignment nThis method is a bit similar to how BLAST works. nSSAP (double dynamic programming) in term 3.

By Michael Schroeder, Biotec, 23 DALI Example nThe regions of common fold, as determined by the program DALI by L. Holm and C. Sander, in the TIM-barrel proteins mouse adenosine deaminase [1fkx] (black) and Pseudomonas diminuta phosphotriesterase [1pta] (red):

By Michael Schroeder, Biotec, 24 Protein zinc finger (4znf) Slides from Hanekamp, University of Wyoming,

By Michael Schroeder, Biotec, 25 Superimposed 3znf and 4znf 30 CA atoms RMS = 0.70Å 248 atoms RMS = 1.42Å Slides from Hanekamp, University of Wyoming, Lys30

By Michael Schroeder, Biotec, 26 Superimposed 3znf and 4znf backbones 30 CA atoms RMS = 0.70Å Slides from Hanekamp, University of Wyoming,

By Michael Schroeder, Biotec, 27 RMSD vs. Sequence Similarity nAt low sequence identity, good structural alignments possible Picture from

By Michael Schroeder, Biotec, 28 Structure Classification

By Michael Schroeder, Biotec, 29 Why classify structures? nStructure similarity is good indicator for homology, therefore classify structures nClassification at different levels nSimilar general folding patterns (structures not necessarily related) nPossibly low sequence similarity, but similar structure and function implies very likely homology nHigh sequence similarity implies similar structures and homology nClassification can be used to investigate evolutionary relationships and possibly infer function

By Michael Schroeder, Biotec, 30 Structure Classification nSCOP: Structural Classification of Proteins nHand curated (Alexei Murzin, Cambridge) with some automation nCATH: Class, Architecture, Topology, Homology nAutomated, where possible, some checks by hand nFSSP: Fold classification based on Structure- Structure alignment of Proteins nFully automated nReasonable correspondance (>80%)

By Michael Schroeder, Biotec, 31 Evolutionary Relation nStrong sequence similarity is assumed to be sufficient to infer homology nClose structural and functional similarity together are also considered sufficient to infer homology nSimilar structure alone not sufficient, as proteins may have converged on structure due to physiochemical necessity nSimilar function alone not sufficient, as proteins may have developed it due to functional selection nIn general, structure is more conserved than sequence nBeware: Descendents of ancestor may have different function, structure, and sequence! Difficult to detect

By Michael Schroeder, Biotec, 32 What is a domain? Single and Multi-Domain Proteins

By Michael Schroeder, Biotec, 33 What is a domain? nFunctional: Domain is “independent” functional unit, which occurs in more than one protein nPhysiochemical: Domain has a hydrophobic core nTopological: Intra-domain distances of atoms are minimal, Inter-domain distances maximal nDifficult to exactly define domain nDifficult to agree on exact domain border

By Michael Schroeder, Biotec, 34 Domains re-occur nA domain re-occurs in different structures and possibly in the context of different other domains nP-loop domain in n1goj: Structure Of A Fast Kinesin: Implications For ATPase Mechanism and Interactions With Microtubules Motor Protein (single domain) n1ii6: Crystal Structure Of The Mitotic Kinesin Eg5 In Complex With Mg-ADP Cell Cycle (two domains)

By Michael Schroeder, Biotec, 35 Domains re-occur 1in5: interaction of P-loop domain (green & orange) and winged helix DNA binding domain 1a5t: interaction of P-loop domain (green & orange) and DNA polymerase III domain

By Michael Schroeder, Biotec, 36 Domains have hydrophobic core nKyte J., Doolittle R.F, J. Mol. Biol. 157: (1982). Ala: Arg: Asn: Asp: Cys: Gln: Glu: Gly: His: Ile: Leu: Lys: Met: Phe: Pro: Ser: Thr: Trp: Tyr: Val: 4.200

By Michael Schroeder, Biotec, 37 Intra-domain distances minimal nDistances between atoms within domain are minimal nDistances between atoms of two different domains are maximal

By Michael Schroeder, Biotec, 38 PDB, Proteins, and Domains nCa structures in PDB n50% single domain n50% multiple domain n90% have less than 5 domains Dom#Freq … …

By Michael Schroeder, Biotec, 39 A structure with 49 domains n1AON, Asymmetric Chaperonin Complex Groel/Groes/(ADP)7

By Michael Schroeder, Biotec, 40 SCOP: Structural Classification of Proteins FOLD CLASS top SUPERFAMILY FAMILY C1 set domains (antibody constant) V set domains (antibody variable) All alpha (218) All Beta (144) Alpha/Beta (136) Alpha+Beta (279)Trypsin-like serine proteases (1) Immunoglobulin-like (23) Transglutaminase (1) Immunoglobulin (6)

By Michael Schroeder, Biotec, 41 Class nAll alpha n(possibly small beta adornments) nAll beta n(possibly small alpha adornments)

By Michael Schroeder, Biotec, 42 Class nAlpha/beta (alpha and beta) = single beta sheet with alpha helices joining C-terminus of one strand to the N-terminus of the next nsubclass: beta sheet forming barrel surrounded by alpha helices nsublass: central planar beta sheet nAlpha+beta (alpha plus beta) = Alpha and beta units are largely separated nStrands joined by hairpins leading to antiparallel sheets

By Michael Schroeder, Biotec, 43 Class nMulti-domain proteins nhave domains placed in different classes ndomains have not been observed elsewhere nE.g. 1hle

By Michael Schroeder, Biotec, 44 Class nMembrane (few and most unique) and cell surface proteins nE.g. Aquaporin 1ih5

By Michael Schroeder, Biotec, 45 Class nSmall Proteins nE.g. Insulin, 1pid

By Michael Schroeder, Biotec, 46 Class nCoiled coil proteins nE.g. 1i4d, Arfaptin-Rac binding fragment

By Michael Schroeder, Biotec, 47 Class nLow-resolution structures, peptides, designed proteins nE.g. 1cis, a designed protein, hybrid protein between chymotrypsin inhibitor CI-2 and helix E from subtilisin Carlsberg from Barley (Hordeum vulgare), hiproly strain

By Michael Schroeder, Biotec, 48 Fold, Superfamily, Family nFold nCommon core structure ni.e. same secondary structure elements in the same arrangement with the same topological structure nSuperfamily nVery similar structure and function nFamily nSequence identity (>30%) or extremely similar structure and function

By Michael Schroeder, Biotec, 49 Distribution (2007) ClassFoldSuperfamilyFamily All alpha All beta Alpha/beta Alpha+beta Multidomain53 74 Membrane and cell surface Small proteins Total

By Michael Schroeder, Biotec, 50 Uses of SCOP nAutomatic classification nUnderstanding of protein enzymatic function nUse superfamily and fold to study distantly related proteins nStudy sequence and structure variability nDerive substitution matrices for sequence comparison nExtract structural principles for design nStudy decomposition of multi domain proteins nEstimate total number of folds nDerived databases

By Michael Schroeder, Biotec, 51 PDB, Proteins, Domains revisited n80% of PDB have only one type of SCOP superfamily n15% of PDB have two different SCOP superfamilies sfNo sfNoFreq

By Michael Schroeder, Biotec, 52 A structure with 23 different superfamilies n1k9m Co Crystal Structure Of Tylosin Bound To The 50S Ribosomal Subunit Of Haloarcula Marismortui Ribosome

By Michael Schroeder, Biotec, 53 The 20 Most Frequently Occurring Superfamilies SuyperfamilySCOP ID#PDB Immunoglobulinb Lysozyme-liked Trypsin-like serine proteasesb P-loop containing nucleotide triphosphate hydrolasesc NAD(P)-binding Rossmann-fold domainsc Globin-likea (Trans)glycosidasesc Acid proteasesb Concanavalin A-like lectins/glucanasesb Thioredoxin-likec EF-handa alpha/beta-Hydrolasesc Cupredoxinsb Ribonuclease H-likec PLP-dependent transferasesc Periplasmic binding protein-like IIc Carbonic anhydraseb Metalloproteases (\zincins\"), catalytic domain"d FAD/NAD(P)-binding domainc Cytochrome ca

By Michael Schroeder, Biotec, 54 CATH nClass nsecondary structure composition nArchitecture norientation in 3D nTopology nconnectivity nHomology nGrouped by evidence for homology (sequence, structure and function)

By Michael Schroeder, Biotec, 55 Generating CATH n1. Identify close relatives by pairwise sequence alignment n2. Detect more distant relatives using n2a. sequence profiles and n2b. structure alignment n3. Structures still unclassified after 1. and 2. are examined by hand to detect domain boundaries n4. Try 2. and 3. again n5. If still unclassified assign manually

By Michael Schroeder, Biotec, 56 CATH step 1: Sequence-based Identification of Homologues Structures n> 30% sequence similarity implies similar structure nRelatives identified using pairwise alignment are clustered using hierarchical clustering with single linkage nReminder…

By Michael Schroeder, Biotec, 57 Hierarchical Clustering (1,2)3(4,5) (1,2) (4,5) (1,2) (3,(4,5)) (1,2)05 (3,(4,5))

By Michael Schroeder, Biotec, 58 Hierarchical Clustering: nHow to define distance between clusters? nSingle linkage: nMinimum nExample: Distance (A,B) to C is 1 nComplete linkage: nMaximum nExample: Distance (A,B) is C is 2 nAverage linkage: nAverage nExample: Distance (A,B) to C is 1.5 nAre dendrograms always the same independent of the linkage method? 0C 10B 210A CBA ABC ABC

By Michael Schroeder, Biotec, 59 Hierarchical Clustering: Chaining nBeware of chaining when using single linkage nAs nearest neighbour selected, it appears that all members of the cluster are very similar to each other, when in fact A and Z are very different ABCD…Z A0123…25 B012…24 C01…23 D0…22 …… Z0 A B C D … Z

By Michael Schroeder, Biotec, 60 CATH and single linkage nIt is argued that nstructural data is quite sparse, nhence it cannot be expected that all cluster members will be very similar (in terms of sequence) to each other, nso that the chaining effect is even useful

By Michael Schroeder, Biotec, 61 CATH step 2a: nProfile-based methods such as PSI-BLAST are used to detect distant relatives nBuild profiles using all sequence data available (rather than only sequences for which structure exists) nThis increases quality of profiles dramatically n51% distant relatives retrieved using profiles based on sequences with known structure only n82% distant relatives retrieved using profile based on all sequences

By Michael Schroeder, Biotec, 62 CATH step 2b: Structure-based methods to detect distant relatives nFor ca. 15% of structures, sequence-based method does not work nExample: For globins sequence similarity can fall below 10%, yet structure and function (oxygen- binding) are preserved nUse SSAP, the Sequential Structure Alignment Program

By Michael Schroeder, Biotec, 63 Clustering Result of Structure Alignment nRelatives identified using pairwise alignment are clustered using hierarchical clustering with single linkage

By Michael Schroeder, Biotec, 64 Improving Efficiency: GRATH nScreening large structures (>300 residues) against database can take days nIdea of GRATH (Graphical Representation of CATH): nImprove efficiency by filtering at a higher level before doing detailed comparison nRepresent protein as graph where nNodes are secondary structure elements represented as their midpoint, tilt, and rotation nEdges distances between midpoints of secondary structure elements nUse algorithm to determine subgraph isomorphism (i.e. does one graph occur in another one) nYes, then do detailed comparison using SSAP

By Michael Schroeder, Biotec, 65 Structure Prediction and Modelling

By Michael Schroeder, Biotec, 66 Structure Prediction: Four Main Problem Areas nGiven a sequence with unknown structure, predict its structure nSecondary structure prediction nPredict regions of helices and strands nHomology modelling nPredict structure from known structures of one or more related proteins nFold recognition nGiven a library of structures, determine which one (if any) is the fold of the given sequence nPrediction of novel folds: A-priori and knowledge-based methods

By Michael Schroeder, Biotec, 67 Structure Prediction of Novel Folds: Two Approaches nA priori: nMost approaches aim to reproduce inter-atomic interactions by ndefining an energy function and ntrying to find global minimum nProblem: nInadequacy of the energy function nAlgorithms get stuck in local minima nKnolwedge-based: nFind similarities to known structures or sub- structures

By Michael Schroeder, Biotec, 68 Secondary Structure Prediction nA successful tool for secondary structure prediction is PROF nPROF uses a neural networks to learn secondary structure from known structures n¾ of PROF’s prediction are correct nAt CASP 2000 it predicted e.g. the following |10 |20 |30 |40 |50 Sequence ALVEDPPLKVSEGGLIREGYDPDLDALRAAHREGVAYFLELEERERERTG Prediction HH EEE------HHHHHHHHHH-HHHHHHHHHHHHHHH- Experiment -E E-----HHHHHHHHHHHHHHHHHHHHHHHHHHHH- |60 |70 |80 | 90 |100 IPTLKVGYNAVFGYYLEVTRPYYERVPKEYRPVQTLKDRQRYTLPEMKEK --EEEEEEEEEEEEEEEE EEEEEEEE—-EEEE-HHHHHH ----EEEEE---EEEEEEEHHHHHH-----EEEEE---EEEEE-HHHHHH |110 |120 EREVYRLEALIRRREEEVFLEVRERAKRQ HHHHHHHHHHHHHHHHHHHHHHHHHHHH- HHHHHHHHHHHHHHHHHHHHHHHHHHH--

By Michael Schroeder, Biotec, 69 PROF’s prediction nThe regions predicted by the PROF server of Rost to be helical are shown as wider ribbons. The prediction missed only a short helix, at the top left of the picture

By Michael Schroeder, Biotec, 70 Homology modelling nDefine the model of an unknown structure by making minimal changes to a relative with known structure nAlign amino acid sequences of target and one or more known structures nInsertions and deletions should be in loop regions nDetermine mainchain segments to represent the regions containing insertions and deletions and stitch these into the known structure nReplace the sidechains of the residues that have been mutated nExamine the model (by hand and computationally) to detect collisions between atoms nRefine the model by limited energy minimisation

By Michael Schroeder, Biotec, 71 Accuracy of Homology Modelling nWorks for >40-50% sequence similarity nExample: SWISS-MODEL Prediction of neurotoxin of red scorpion (1DQ7) from neurotoxin of yellow scorpion (1PTX)

By Michael Schroeder, Biotec, 72 Fold Recognition: 3D Profiles nGiven a sequence determine which (if any) fold is most similar nCan we build profiles to represent structures of similar fold (similar to sequence profiles)? n3D profiles: nClassify the environment of each residue nSecondary structure: nIs it part of helix, sheet or other (determined by Mainchain hydrogen bonding interactions) nSurface exposure: n 114A 2 accessible surface area nPolar or non-polar nature of environment nTotal of 18 residue classes, one of which each residue is part of nSequence of these residue classes is 3D profile

By Michael Schroeder, Biotec, 73 3D Profiles and Alignments nStructure-Structure Alignment: n3D profiles of two known structures can be aligned against each other nSequence-Structure Alignment: nBased on existing 3D profiles, probability can be determined for a residue occurring in a residue class. nUsing this probability, we can assign 3D profile to a sequence nAnd hence align the sequence 3D profile to a structure 3D profile nFor correctly determined protein structures, the structure 3D profile fits the sequence 3D profile well nHowever, other proteins may score even better nIf a structure does not match its own 3D profile well it is likely that there is an error in the structure determination

By Michael Schroeder, Biotec, 74 Threading nPull query sequence through known structure and rate the score nNecessary: nMethod to score the models to select best one nMethod to calibrate the scores to decide which of the best is correct Homology modelling Threading Identify homologues Try all possible parents Determine optimal alignment Try many alignments Optimize one model Evaluate many rough models

By Michael Schroeder, Biotec, 75 Scoring for Threading nEmpirical patterns of residue neighbours derived from known structures nObserve distribution of inter-residue distances for all 20 x 20 residue pairs nDerive probability distribution as function of distance in space and on sequence nBoltzmann equation relates probability and energy nReverse this and derive energy function from probability distribution

By Michael Schroeder, Biotec, 76 Threading the sequence template Target Slides from Hanekamp, University of Wyoming,

By Michael Schroeder, Biotec, 77 “Threaded” sequence Yellow = adrenergic receptor sequence Blue = adrenergic receptor (PDB 1F88 ) Slides from Hanekamp, University of Wyoming,

By Michael Schroeder, Biotec, 78 Modeled structure Gaps Slides from Hanekamp, University of Wyoming,

By Michael Schroeder, Biotec, 79 Corrected Model Slides from Hanekamp, University of Wyoming,

By Michael Schroeder, Biotec, 80 Ab initio Structure Prediction

By Michael Schroeder, Biotec, 81 Molecular dynamics nStructure prediction = place atoms so that interactions between them create a unique state of maximum stability nProblem: nModel of inter-atomic distances is not complete nComputational scale: nLarge number of variables and massive search space nNon-linearities nRough energy surface with many local minima

By Michael Schroeder, Biotec, 82 Conformational energy calculations nBond stretching: nBond angle bend nTorsion angle (e.g. , ,  ) nVan der Waals interactions nShort-range repulsion ~R -12 and long-range attraction ~R -6, where R is the inter-atom distance nHydrogen bond nWeak chemical/electrostatic interaction, ~R -12 and ~R -10 nElectrostatics nCharges on atoms nSolvent nInteractions with water, salt, sugar, etc.

By Michael Schroeder, Biotec, 83 Rosetta nPredicts structure by first generating structures of fragments using known structures (3-9 residues) nCombine fragments using Monte Carlo simulation using an energy function with terms for nPaired beta-sheets nBurial of hydrophobic residues nCarries out 1000 simulations nResults are clustered and the centre of the largest cluster is presented as prediction nDemoDemo

By Michael Schroeder, Biotec, 84 ROSETTA nThe program ROSETTA, by D. Baker and colleagues, can predict the structures of proteins for which no complete domain of similar folding pattern appears in the database. Prediction by ROSETTA of H. influenzae, hypothetical protein. Black lines, experimental structure; red lines, prediction

By Michael Schroeder, Biotec, 85 Rosetta nPrediction by ROSETTA of The N-terminal half of domain 1 of human DNA repair protein Xrcc4. This figures shows a selected substructure of Xrcc4 containing the N-terminal 55 out of 116 residues. Black lines, experimental structure; red lines, prediction

By Michael Schroeder, Biotec, 86 LINUS nAnother programme with similar idea nPrediction by LINUS (program by G.D. Rose and R. Srinivasan) of C- terminal domain of rat endoplasmic reticulum protein ERp29. Black lines, experimental structure; red lines, prediction

By Michael Schroeder, Biotec, 87 Monte Carlo Simulation nObjective: Find conformation with minimal energy nProblem: Avoid local minima nAlgorithm: n1. Generate a random initial conformation x n2. Perturb conformation x to generate a neighbouring conformation x’ n3. Calculate the energies E(x) and E(x’), resp., for conformations x and x’ n4. If E(x)>E(x’) (i.e. x’ is an improvement, we go down hill from x to x’) then accept x’ as new conformation and go to 2. n5. If E(x)<E(x’) (i.e. x’ is no improvement, we go uphill from x to x’) then accept x’ as new conformation with probability p n6. The probability p to accept uphill moves is reduced with every step n7. Go to step 2. nStep make sure that we “walk” downhill towards a minimum nStep make sure that if we are in local minimum there is a chance to get out of it by accepting an uphill move. It’s important that this probability decreases so that we are getting more and more unlikely to walk uphill

By Michael Schroeder, Biotec, 88 Summary nYou should know now nWhat helices, strands, sheets are nWhat a Ramachandran plot is nHow to score a structural alignment (rmsd) nHow to compute a structural alignment nHow a domain can be characterised nWhy structure classification is useful nWhat the main structure classes are nHow classifications can be generated automatically nWhat the problems are nWhat secondary structure prediction, homology modelling, threading, ab-initio and knowledge-based structure prediction of novel folds are nVisit PDB, SCOP and CATH websites and nRead chapter 5