Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics, Bourne/Weissig, chapter 12 and 13
By Michael Schroeder, Biotec, 2 Folding nProteins are linear polymer mainchains with different amino acid side chains nProteins fold spontaneously reaching a state of minimal energy nSide and main chains interact with one another and with solvent nExample movieExample movie Jones, D.T. (1997) Successful ab initio prediction of the tertiary structure of NK- Lysin using multiple sequences and recognized supersecondary structural motifs. PROTEINS. Suppl. 1,
By Michael Schroeder, Biotec, 3 Examining Proteins nSpecialised tools with different views of structure nCorey, Pauling, Koltun (CPK) nDiameter of sphere ~ atomic radius nHydrogen white, carbon grey, nitrogen blue, oxygen red, sulphur yellow nCartoon nWire nBalls
By Michael Schroeder, Biotec, 4 Examining Proteins
By Michael Schroeder, Biotec, 5 Protein Folding Residue Image taken from nConformation of residue nRotation around N-C a bond, (phi) nRotation around C a -C bond, (psi) nRotation around peptide bond (omega) nPeptide bond tends to be nplanar and nin one of two states: ntrans 180 (usually) and ncis, 0 (rarely, and mostly proline)
By Michael Schroeder, Biotec, 6 Sasisekharan-Ramakrishnan- Ramachandran plot nSolid line = energetically preferred nOutside dotted line = disallowed nMost amino acids fall into R region (right-handed alpha helix) or -region (beta-strand) nGlycine has additional conformations (e.g. left- handed alpha helix = L region) and in lower right panel Image taken from
By Michael Schroeder, Biotec, 7 Ramachandran plot Plot for a protein with mostly beta-sheets Example for conformations Image taken from
By Michael Schroeder, Biotec, 8 Helices and Strands nConsecutive residues in alpha or beta conformation generate alpha-helices and beta- strands, respectively nSuch secondary structure elements are stabilised by weak hydrogen bonds nThey are by turns or loops, regions in which the chain alters direction nTurns are often surface exposed and tend to contain charged or polar residues
By Michael Schroeder, Biotec, 9 Alpha Helix nResidue j is hydrogen-bonded to residue j+4 n3.6 residues per turn n1.5A rise per turn nRepeat every 3.6*1.5A = 5.4 A n = -60 , = -45 Image taken from
By Michael Schroeder, Biotec, 10 Beta strand Image taken from
By Michael Schroeder, Biotec, 11 Beta Sheets Image taken from
By Michael Schroeder, Biotec, 12 Turn nResidue j is bonded to residue j+3 nOften proline and glycine Image taken from
By Michael Schroeder, Biotec, 13 How to Fold a Structure nAll residues must have stereochemically allowed conformations nBuried polar atoms must be hydrogen-bonded nIf a few are missed, it might be energetically preferable to bond these to solvent nEnough hydrophobic surface must be buried and interior must be sufficiently densely packed nThere is evidence, that folding occurs hierarchically: First secondary structure elements, then super- secondary,… nThis justifies hierarchic approach when simulating folding
By Michael Schroeder, Biotec, 14 Structure Alignment + Slides from Hanekamp, University of Wyoming,
By Michael Schroeder, Biotec, 15 Structure Alignment +
By Michael Schroeder, Biotec, 16 Structure Alignment nIn the same way that we align sequences, we wish to align structure nLet’s start simple: How to score an alignment nSequences: E.g. percentage of matching residues nStructure: rmsd (root mean square deviation)
By Michael Schroeder, Biotec, 17 Root Mean Square Deviation nWhat is the distance between two points a with coordinates x a and y a and b with coordinates x b and y b ? nEuclidean distance: d(a,b) = √ (x a- -x b ) 2 + (y a -y b ) 2 + (z a -z b ) 2 a b
By Michael Schroeder, Biotec, 18 Root Mean Square Deviation nIn a structure alignment the score measures how far the aligned atoms are from each other on average nGiven the distances d i between n aligned atoms, the root mean square deviation is defined as rmsd = √ 1/n ∑ d i 2
By Michael Schroeder, Biotec, 19 Quality of Alignment and Example nUnit of RMSD => e.g. Ångstroms nIdentical structures => RMSD = “0” nSimilar structures => RMSD is small (1 – 3 Å) nDistant structures => RMSD > 3 Å nStructural superposition of gamma-chymotrypsin and Staphylococcus aureus epidermolytic toxin A
By Michael Schroeder, Biotec, 20 Pitfalls of RMSD nall atoms are treated equally (e.g. residues on the surface have a higher degree of freedom than those in the core) nbest alignment does not always mean minimal RMSD nsignificance of RMSD is size dependent From MOLB5650
By Michael Schroeder, Biotec, 21 Alternative RSMDs naRMSD = best root-mean-square deviation calculated over all aligned alpha-carbon atoms nbRMSD = the RMSD over the highest scoring residue pairs nwRMSD = weighted RMSD Source: W. Taylor(1999), Protein Science, 8: From MOLB5650
By Michael Schroeder, Biotec, 22 Computing Structural Alignments nDALI (Distance-matrix-ALIgnment) is one of the first tools for structural alignment nHow does it work? nAtoms: nGiven two structures’ atomic coordinates nCompute two distance matrices: nCompute for each structure all pairwise inter-atom distances. nThis step is done as the computed distances are independent of a coordinate system nThe two original atomic coordinate sets cannot be compared, the two distance matrices can nAlign two distance matrices: nFind small (e.g. 6x6) sub-matrices along diagonal that match nExtend these matches to form overall alignment nThis method is a bit similar to how BLAST works. nSSAP (double dynamic programming) in term 3.
By Michael Schroeder, Biotec, 23 DALI Example nThe regions of common fold, as determined by the program DALI by L. Holm and C. Sander, in the TIM-barrel proteins mouse adenosine deaminase [1fkx] (black) and Pseudomonas diminuta phosphotriesterase [1pta] (red):
By Michael Schroeder, Biotec, 24 Protein zinc finger (4znf) Slides from Hanekamp, University of Wyoming,
By Michael Schroeder, Biotec, 25 Superimposed 3znf and 4znf 30 CA atoms RMS = 0.70Å 248 atoms RMS = 1.42Å Slides from Hanekamp, University of Wyoming, Lys30
By Michael Schroeder, Biotec, 26 Superimposed 3znf and 4znf backbones 30 CA atoms RMS = 0.70Å Slides from Hanekamp, University of Wyoming,
By Michael Schroeder, Biotec, 27 RMSD vs. Sequence Similarity nAt low sequence identity, good structural alignments possible Picture from
By Michael Schroeder, Biotec, 28 Structure Classification
By Michael Schroeder, Biotec, 29 Why classify structures? nStructure similarity is good indicator for homology, therefore classify structures nClassification at different levels nSimilar general folding patterns (structures not necessarily related) nPossibly low sequence similarity, but similar structure and function implies very likely homology nHigh sequence similarity implies similar structures and homology nClassification can be used to investigate evolutionary relationships and possibly infer function
By Michael Schroeder, Biotec, 30 Structure Classification nSCOP: Structural Classification of Proteins nHand curated (Alexei Murzin, Cambridge) with some automation nCATH: Class, Architecture, Topology, Homology nAutomated, where possible, some checks by hand nFSSP: Fold classification based on Structure- Structure alignment of Proteins nFully automated nReasonable correspondance (>80%)
By Michael Schroeder, Biotec, 31 Evolutionary Relation nStrong sequence similarity is assumed to be sufficient to infer homology nClose structural and functional similarity together are also considered sufficient to infer homology nSimilar structure alone not sufficient, as proteins may have converged on structure due to physiochemical necessity nSimilar function alone not sufficient, as proteins may have developed it due to functional selection nIn general, structure is more conserved than sequence nBeware: Descendents of ancestor may have different function, structure, and sequence! Difficult to detect
By Michael Schroeder, Biotec, 32 What is a domain? Single and Multi-Domain Proteins
By Michael Schroeder, Biotec, 33 What is a domain? nFunctional: Domain is “independent” functional unit, which occurs in more than one protein nPhysiochemical: Domain has a hydrophobic core nTopological: Intra-domain distances of atoms are minimal, Inter-domain distances maximal nDifficult to exactly define domain nDifficult to agree on exact domain border
By Michael Schroeder, Biotec, 34 Domains re-occur nA domain re-occurs in different structures and possibly in the context of different other domains nP-loop domain in n1goj: Structure Of A Fast Kinesin: Implications For ATPase Mechanism and Interactions With Microtubules Motor Protein (single domain) n1ii6: Crystal Structure Of The Mitotic Kinesin Eg5 In Complex With Mg-ADP Cell Cycle (two domains)
By Michael Schroeder, Biotec, 35 Domains re-occur 1in5: interaction of P-loop domain (green & orange) and winged helix DNA binding domain 1a5t: interaction of P-loop domain (green & orange) and DNA polymerase III domain
By Michael Schroeder, Biotec, 36 Domains have hydrophobic core nKyte J., Doolittle R.F, J. Mol. Biol. 157: (1982). Ala: Arg: Asn: Asp: Cys: Gln: Glu: Gly: His: Ile: Leu: Lys: Met: Phe: Pro: Ser: Thr: Trp: Tyr: Val: 4.200
By Michael Schroeder, Biotec, 37 Intra-domain distances minimal nDistances between atoms within domain are minimal nDistances between atoms of two different domains are maximal
By Michael Schroeder, Biotec, 38 PDB, Proteins, and Domains nCa structures in PDB n50% single domain n50% multiple domain n90% have less than 5 domains Dom#Freq … …
By Michael Schroeder, Biotec, 39 A structure with 49 domains n1AON, Asymmetric Chaperonin Complex Groel/Groes/(ADP)7
By Michael Schroeder, Biotec, 40 SCOP: Structural Classification of Proteins FOLD CLASS top SUPERFAMILY FAMILY C1 set domains (antibody constant) V set domains (antibody variable) All alpha (218) All Beta (144) Alpha/Beta (136) Alpha+Beta (279)Trypsin-like serine proteases (1) Immunoglobulin-like (23) Transglutaminase (1) Immunoglobulin (6)
By Michael Schroeder, Biotec, 41 Class nAll alpha n(possibly small beta adornments) nAll beta n(possibly small alpha adornments)
By Michael Schroeder, Biotec, 42 Class nAlpha/beta (alpha and beta) = single beta sheet with alpha helices joining C-terminus of one strand to the N-terminus of the next nsubclass: beta sheet forming barrel surrounded by alpha helices nsublass: central planar beta sheet nAlpha+beta (alpha plus beta) = Alpha and beta units are largely separated nStrands joined by hairpins leading to antiparallel sheets
By Michael Schroeder, Biotec, 43 Class nMulti-domain proteins nhave domains placed in different classes ndomains have not been observed elsewhere nE.g. 1hle
By Michael Schroeder, Biotec, 44 Class nMembrane (few and most unique) and cell surface proteins nE.g. Aquaporin 1ih5
By Michael Schroeder, Biotec, 45 Class nSmall Proteins nE.g. Insulin, 1pid
By Michael Schroeder, Biotec, 46 Class nCoiled coil proteins nE.g. 1i4d, Arfaptin-Rac binding fragment
By Michael Schroeder, Biotec, 47 Class nLow-resolution structures, peptides, designed proteins nE.g. 1cis, a designed protein, hybrid protein between chymotrypsin inhibitor CI-2 and helix E from subtilisin Carlsberg from Barley (Hordeum vulgare), hiproly strain
By Michael Schroeder, Biotec, 48 Fold, Superfamily, Family nFold nCommon core structure ni.e. same secondary structure elements in the same arrangement with the same topological structure nSuperfamily nVery similar structure and function nFamily nSequence identity (>30%) or extremely similar structure and function
By Michael Schroeder, Biotec, 49 Distribution (2007) ClassFoldSuperfamilyFamily All alpha All beta Alpha/beta Alpha+beta Multidomain53 74 Membrane and cell surface Small proteins Total
By Michael Schroeder, Biotec, 50 Uses of SCOP nAutomatic classification nUnderstanding of protein enzymatic function nUse superfamily and fold to study distantly related proteins nStudy sequence and structure variability nDerive substitution matrices for sequence comparison nExtract structural principles for design nStudy decomposition of multi domain proteins nEstimate total number of folds nDerived databases
By Michael Schroeder, Biotec, 51 PDB, Proteins, Domains revisited n80% of PDB have only one type of SCOP superfamily n15% of PDB have two different SCOP superfamilies sfNo sfNoFreq
By Michael Schroeder, Biotec, 52 A structure with 23 different superfamilies n1k9m Co Crystal Structure Of Tylosin Bound To The 50S Ribosomal Subunit Of Haloarcula Marismortui Ribosome
By Michael Schroeder, Biotec, 53 The 20 Most Frequently Occurring Superfamilies SuyperfamilySCOP ID#PDB Immunoglobulinb Lysozyme-liked Trypsin-like serine proteasesb P-loop containing nucleotide triphosphate hydrolasesc NAD(P)-binding Rossmann-fold domainsc Globin-likea (Trans)glycosidasesc Acid proteasesb Concanavalin A-like lectins/glucanasesb Thioredoxin-likec EF-handa alpha/beta-Hydrolasesc Cupredoxinsb Ribonuclease H-likec PLP-dependent transferasesc Periplasmic binding protein-like IIc Carbonic anhydraseb Metalloproteases (\zincins\"), catalytic domain"d FAD/NAD(P)-binding domainc Cytochrome ca
By Michael Schroeder, Biotec, 54 CATH nClass nsecondary structure composition nArchitecture norientation in 3D nTopology nconnectivity nHomology nGrouped by evidence for homology (sequence, structure and function)
By Michael Schroeder, Biotec, 55 Generating CATH n1. Identify close relatives by pairwise sequence alignment n2. Detect more distant relatives using n2a. sequence profiles and n2b. structure alignment n3. Structures still unclassified after 1. and 2. are examined by hand to detect domain boundaries n4. Try 2. and 3. again n5. If still unclassified assign manually
By Michael Schroeder, Biotec, 56 CATH step 1: Sequence-based Identification of Homologues Structures n> 30% sequence similarity implies similar structure nRelatives identified using pairwise alignment are clustered using hierarchical clustering with single linkage nReminder…
By Michael Schroeder, Biotec, 57 Hierarchical Clustering (1,2)3(4,5) (1,2) (4,5) (1,2) (3,(4,5)) (1,2)05 (3,(4,5))
By Michael Schroeder, Biotec, 58 Hierarchical Clustering: nHow to define distance between clusters? nSingle linkage: nMinimum nExample: Distance (A,B) to C is 1 nComplete linkage: nMaximum nExample: Distance (A,B) is C is 2 nAverage linkage: nAverage nExample: Distance (A,B) to C is 1.5 nAre dendrograms always the same independent of the linkage method? 0C 10B 210A CBA ABC ABC
By Michael Schroeder, Biotec, 59 Hierarchical Clustering: Chaining nBeware of chaining when using single linkage nAs nearest neighbour selected, it appears that all members of the cluster are very similar to each other, when in fact A and Z are very different ABCD…Z A0123…25 B012…24 C01…23 D0…22 …… Z0 A B C D … Z
By Michael Schroeder, Biotec, 60 CATH and single linkage nIt is argued that nstructural data is quite sparse, nhence it cannot be expected that all cluster members will be very similar (in terms of sequence) to each other, nso that the chaining effect is even useful
By Michael Schroeder, Biotec, 61 CATH step 2a: nProfile-based methods such as PSI-BLAST are used to detect distant relatives nBuild profiles using all sequence data available (rather than only sequences for which structure exists) nThis increases quality of profiles dramatically n51% distant relatives retrieved using profiles based on sequences with known structure only n82% distant relatives retrieved using profile based on all sequences
By Michael Schroeder, Biotec, 62 CATH step 2b: Structure-based methods to detect distant relatives nFor ca. 15% of structures, sequence-based method does not work nExample: For globins sequence similarity can fall below 10%, yet structure and function (oxygen- binding) are preserved nUse SSAP, the Sequential Structure Alignment Program
By Michael Schroeder, Biotec, 63 Clustering Result of Structure Alignment nRelatives identified using pairwise alignment are clustered using hierarchical clustering with single linkage
By Michael Schroeder, Biotec, 64 Improving Efficiency: GRATH nScreening large structures (>300 residues) against database can take days nIdea of GRATH (Graphical Representation of CATH): nImprove efficiency by filtering at a higher level before doing detailed comparison nRepresent protein as graph where nNodes are secondary structure elements represented as their midpoint, tilt, and rotation nEdges distances between midpoints of secondary structure elements nUse algorithm to determine subgraph isomorphism (i.e. does one graph occur in another one) nYes, then do detailed comparison using SSAP
By Michael Schroeder, Biotec, 65 Structure Prediction and Modelling
By Michael Schroeder, Biotec, 66 Structure Prediction: Four Main Problem Areas nGiven a sequence with unknown structure, predict its structure nSecondary structure prediction nPredict regions of helices and strands nHomology modelling nPredict structure from known structures of one or more related proteins nFold recognition nGiven a library of structures, determine which one (if any) is the fold of the given sequence nPrediction of novel folds: A-priori and knowledge-based methods
By Michael Schroeder, Biotec, 67 Structure Prediction of Novel Folds: Two Approaches nA priori: nMost approaches aim to reproduce inter-atomic interactions by ndefining an energy function and ntrying to find global minimum nProblem: nInadequacy of the energy function nAlgorithms get stuck in local minima nKnolwedge-based: nFind similarities to known structures or sub- structures
By Michael Schroeder, Biotec, 68 Secondary Structure Prediction nA successful tool for secondary structure prediction is PROF nPROF uses a neural networks to learn secondary structure from known structures n¾ of PROF’s prediction are correct nAt CASP 2000 it predicted e.g. the following |10 |20 |30 |40 |50 Sequence ALVEDPPLKVSEGGLIREGYDPDLDALRAAHREGVAYFLELEERERERTG Prediction HH EEE------HHHHHHHHHH-HHHHHHHHHHHHHHH- Experiment -E E-----HHHHHHHHHHHHHHHHHHHHHHHHHHHH- |60 |70 |80 | 90 |100 IPTLKVGYNAVFGYYLEVTRPYYERVPKEYRPVQTLKDRQRYTLPEMKEK --EEEEEEEEEEEEEEEE EEEEEEEE—-EEEE-HHHHHH ----EEEEE---EEEEEEEHHHHHH-----EEEEE---EEEEE-HHHHHH |110 |120 EREVYRLEALIRRREEEVFLEVRERAKRQ HHHHHHHHHHHHHHHHHHHHHHHHHHHH- HHHHHHHHHHHHHHHHHHHHHHHHHHH--
By Michael Schroeder, Biotec, 69 PROF’s prediction nThe regions predicted by the PROF server of Rost to be helical are shown as wider ribbons. The prediction missed only a short helix, at the top left of the picture
By Michael Schroeder, Biotec, 70 Homology modelling nDefine the model of an unknown structure by making minimal changes to a relative with known structure nAlign amino acid sequences of target and one or more known structures nInsertions and deletions should be in loop regions nDetermine mainchain segments to represent the regions containing insertions and deletions and stitch these into the known structure nReplace the sidechains of the residues that have been mutated nExamine the model (by hand and computationally) to detect collisions between atoms nRefine the model by limited energy minimisation
By Michael Schroeder, Biotec, 71 Accuracy of Homology Modelling nWorks for >40-50% sequence similarity nExample: SWISS-MODEL Prediction of neurotoxin of red scorpion (1DQ7) from neurotoxin of yellow scorpion (1PTX)
By Michael Schroeder, Biotec, 72 Fold Recognition: 3D Profiles nGiven a sequence determine which (if any) fold is most similar nCan we build profiles to represent structures of similar fold (similar to sequence profiles)? n3D profiles: nClassify the environment of each residue nSecondary structure: nIs it part of helix, sheet or other (determined by Mainchain hydrogen bonding interactions) nSurface exposure: n 114A 2 accessible surface area nPolar or non-polar nature of environment nTotal of 18 residue classes, one of which each residue is part of nSequence of these residue classes is 3D profile
By Michael Schroeder, Biotec, 73 3D Profiles and Alignments nStructure-Structure Alignment: n3D profiles of two known structures can be aligned against each other nSequence-Structure Alignment: nBased on existing 3D profiles, probability can be determined for a residue occurring in a residue class. nUsing this probability, we can assign 3D profile to a sequence nAnd hence align the sequence 3D profile to a structure 3D profile nFor correctly determined protein structures, the structure 3D profile fits the sequence 3D profile well nHowever, other proteins may score even better nIf a structure does not match its own 3D profile well it is likely that there is an error in the structure determination
By Michael Schroeder, Biotec, 74 Threading nPull query sequence through known structure and rate the score nNecessary: nMethod to score the models to select best one nMethod to calibrate the scores to decide which of the best is correct Homology modelling Threading Identify homologues Try all possible parents Determine optimal alignment Try many alignments Optimize one model Evaluate many rough models
By Michael Schroeder, Biotec, 75 Scoring for Threading nEmpirical patterns of residue neighbours derived from known structures nObserve distribution of inter-residue distances for all 20 x 20 residue pairs nDerive probability distribution as function of distance in space and on sequence nBoltzmann equation relates probability and energy nReverse this and derive energy function from probability distribution
By Michael Schroeder, Biotec, 76 Threading the sequence template Target Slides from Hanekamp, University of Wyoming,
By Michael Schroeder, Biotec, 77 “Threaded” sequence Yellow = adrenergic receptor sequence Blue = adrenergic receptor (PDB 1F88 ) Slides from Hanekamp, University of Wyoming,
By Michael Schroeder, Biotec, 78 Modeled structure Gaps Slides from Hanekamp, University of Wyoming,
By Michael Schroeder, Biotec, 79 Corrected Model Slides from Hanekamp, University of Wyoming,
By Michael Schroeder, Biotec, 80 Ab initio Structure Prediction
By Michael Schroeder, Biotec, 81 Molecular dynamics nStructure prediction = place atoms so that interactions between them create a unique state of maximum stability nProblem: nModel of inter-atomic distances is not complete nComputational scale: nLarge number of variables and massive search space nNon-linearities nRough energy surface with many local minima
By Michael Schroeder, Biotec, 82 Conformational energy calculations nBond stretching: nBond angle bend nTorsion angle (e.g. , , ) nVan der Waals interactions nShort-range repulsion ~R -12 and long-range attraction ~R -6, where R is the inter-atom distance nHydrogen bond nWeak chemical/electrostatic interaction, ~R -12 and ~R -10 nElectrostatics nCharges on atoms nSolvent nInteractions with water, salt, sugar, etc.
By Michael Schroeder, Biotec, 83 Rosetta nPredicts structure by first generating structures of fragments using known structures (3-9 residues) nCombine fragments using Monte Carlo simulation using an energy function with terms for nPaired beta-sheets nBurial of hydrophobic residues nCarries out 1000 simulations nResults are clustered and the centre of the largest cluster is presented as prediction nDemoDemo
By Michael Schroeder, Biotec, 84 ROSETTA nThe program ROSETTA, by D. Baker and colleagues, can predict the structures of proteins for which no complete domain of similar folding pattern appears in the database. Prediction by ROSETTA of H. influenzae, hypothetical protein. Black lines, experimental structure; red lines, prediction
By Michael Schroeder, Biotec, 85 Rosetta nPrediction by ROSETTA of The N-terminal half of domain 1 of human DNA repair protein Xrcc4. This figures shows a selected substructure of Xrcc4 containing the N-terminal 55 out of 116 residues. Black lines, experimental structure; red lines, prediction
By Michael Schroeder, Biotec, 86 LINUS nAnother programme with similar idea nPrediction by LINUS (program by G.D. Rose and R. Srinivasan) of C- terminal domain of rat endoplasmic reticulum protein ERp29. Black lines, experimental structure; red lines, prediction
By Michael Schroeder, Biotec, 87 Monte Carlo Simulation nObjective: Find conformation with minimal energy nProblem: Avoid local minima nAlgorithm: n1. Generate a random initial conformation x n2. Perturb conformation x to generate a neighbouring conformation x’ n3. Calculate the energies E(x) and E(x’), resp., for conformations x and x’ n4. If E(x)>E(x’) (i.e. x’ is an improvement, we go down hill from x to x’) then accept x’ as new conformation and go to 2. n5. If E(x)<E(x’) (i.e. x’ is no improvement, we go uphill from x to x’) then accept x’ as new conformation with probability p n6. The probability p to accept uphill moves is reduced with every step n7. Go to step 2. nStep make sure that we “walk” downhill towards a minimum nStep make sure that if we are in local minimum there is a chance to get out of it by accepting an uphill move. It’s important that this probability decreases so that we are getting more and more unlikely to walk uphill
By Michael Schroeder, Biotec, 88 Summary nYou should know now nWhat helices, strands, sheets are nWhat a Ramachandran plot is nHow to score a structural alignment (rmsd) nHow to compute a structural alignment nHow a domain can be characterised nWhy structure classification is useful nWhat the main structure classes are nHow classifications can be generated automatically nWhat the problems are nWhat secondary structure prediction, homology modelling, threading, ab-initio and knowledge-based structure prediction of novel folds are nVisit PDB, SCOP and CATH websites and nRead chapter 5