Presentation on theme: "Molecular Replacement in CCP4"— Presentation transcript:
1Molecular Replacement in CCP4 Martyn WinnCCP4 group, Daresbury Laboratory
2Data analysis before MR Matthews coefficientNumber copies in a.s.u.Native Patterson(translational NCS)B factor analysisSelf RF(rotational NCS)
3Data analysis before MR Interface to Sfcheck (currently in Validation&Deposition module)completeness, anisotropy, Wilson B, twinning check, pseudo-translation check
4Finding search modelsNeed a PDB file for a structurally similar protein. This usually means a homologous protein.Either you have one already? Or you search the Protein Data BankSearch is based on sequence alignment between target protein and proteins in PDB.Several bioinformatics tools can help here:OCA, MSDlite, MSDtarget - all use FASTApsiBLAST - iterative searchingFFAS - profile-profile alignmentffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl
5Editing search modelsDon’t use a raw PDB file for Molecular Replacement unless it is very similar (e.g. same protein, different conditions, ligand, etc.)Edit it to:remove residues that don’t occur in the targetremove side chain atoms that don’t occur in the target(these assume a know alignment from model to target)remove uncertain regions of model (check B factors, occupancies)remove flexible loopsNote that we don’t add anything!! Homology modelling?Consider use of individual domains and multimers(see MrBUMP below)
7MR model preparation: chainsaw Molecular replacement model preparation utility that edits a PDB search model according to a sequence alignment.Features:Removes un-aligned residues from the modelPrunes non-conserved residues back to the gamma atomPreserves more atoms than in polyalanine modelUnmodified templateChainsaw templatePolyalanine templateExample of 1mr6 used as a template for 1tgx (38% sequence identity)
8Running Chainsaw: complete PDB file model to target alignment Alignment from:original search tool (FASTA, psiBLAST, etc.)multiple alignment (set of search models, protein family, etc.)hand-created
10Molrep: overview of functionality Performs complete MR in single step:Expt. data (MTZ)Positionedsearch modelMolrepSearch model (PDB)Individual steps for more difficult cases: CRF, TF, rigid-bodyMulti-copy search: locked CRF, dyad searchSelf RFPhased TF, spherically-averaged phased TFImprove search modelOther search models: electron density map, NMR modelsFit model in electron density map / EM map
11MR for straightforward case via GUI: titlemodeMTZ fileMTZ labelssearch modelRUN IT!
12|F|new = |F|input *exp(-Badd*s2)*(1-exp(-Boff*s2) Other parametersDEFAULTS ARE GOODLow resolution cut-offMolrep uses soft cut-off, Boff (BOFF, COMPL, RESMIN)High resolution cut-offMolrep uses soft cut-off, Badd (BADD, SIM)|F|new = |F|input *exp(-Badd*s2)*(1-exp(-Boff*s2)Defaults estimatedHigh resolution limitAbsolute cut-off (RESMAX)Default estimatedRadius of Patterson sphere for CRFDefault is twice radius of gyration of search model, Keyword RAD, Infrequently Used Parameters in GUI
13Cross Rotation Function Euler angles (CCP4)polar anglesR factorList of top RF peaksMore details here
14Translation Function fractional translation R factor Score polar anglesList of solutions:top TF for each RF solutioncontrast of solution
15Identification of solutions SCORE = product Correlation Coefficient and maximal value of Packing FunctionPacking Function integrated into TF search removes solutions with overlapping moleculesCONTRAST = ratio of top score to mean score:>2.5 - definitely solution <2.5 and > solution <1.8 and > maybe solution <1.5 and > maybe not solution, but program accepts it <1.3 - probably not solution
16Finding more than one copy in the asu By default, Molrep will estimate number of copies to find.Override with NMON keywordProgram flow:CRFTF for first copyFix first copyTF for second copyFix second copyTF for third copy.
17Solving complexes Choose first component (largest, highest similarity) Solve for first component (probably need to specify NMON explicitly)New Molrep jobModel in - second componentFixed in - positioned first componentRepeat for all other componentsPossibility to use spherically-averaged phased TF using phasesfrom first component
18Phaser Randy Read, Airlie McCoy, Cambridge Phaser website:
19Performs complete MR in single step: Expt. data (MTZ)Positionedsearch modelPhaserSearch model (PDB)Use “MODE MR_AUTO” or “automated search” in the GUIanisotropy correctionfast rotation functionfast translation functionpackingrefinement and phasingloop over models
20More functionality ... All steps can be run separately Search over spacegroups:MTZ spacegroup and enantiomorphAll spacegroups in MTZ point-groupSelected spacegroupsEnsemble models (see later)Brute RF and TF - slow and accurateNormal mode analysisGenerates perturbed models
21MR for straightforward case via GUI: modeMTZ filetarget detailssearch modelspecify searchRUN IT!
22FRFEuler angles (CCP4)Top LLG and Z-scores for FRF
23FTF fractional translation FRF solution number Top LLG and Z-scores for FRF
24Packing Phaser does packing check after FTF Clashes = C atoms closer than 2ÅDefault number of clashes = 0Think about increasing to 2 or 5
25Solution files: .sol file produced at end of job Contains summary of all solutionsEach solution contains rotations and usually translations -3DIM vs 6DIMOne line per model located.sol file can be read back into Phaser in later jobsZ-score Have I solved it?less than 5 nounlikelypossiblyprobablymore than 8 definitelyRFZ = RF Z-scoreTFZ = TF Z-score
26Phaser refers to search models as “ensembles” Ensemble modelsPhaser refers to search models as “ensembles”Often, ensemble contains single model, as in traditional MRBut Phaser can use an ensemble of > 1 models, which may work better than any single modelModels in an ensemble must be superposed prior to use in Phaser - use e.g. Superpose in CCP4N.B. Phaser will complain if:MW of models in ensemble are too differentRMS between models is too large(In Molrep, construct ensemble as pseudo-NMR PDB file)
27Finding more than one copy in the asu Specify > 1 in Composition of the asymmetric unit(keyword COMPOSITION ... NUMBER)Specify > 1 in Number of copies to search for(keyword SEARCH ... NUMBER)Phaser will issue warnings if these numbers are wrong.CRFTF for first copyFix first copy (possibly multiple sets)CRF for second opyTF for second copyFix second copy (possibly multiple sets).
28Complexes As before, but: Define > 1 type of component Composition of the asymmetric unitDefine another componentDefine > 1 ensembleDefine ensemblesAdd ensembleSpecify all searchesSearch detailsAdd another searchE.g. beta-blip example in Phaser tutorial:
30The aim of MrBUMP An automation framework for Molecular Replacement. Particular emphasis on generating a variety of search models.Can be used to generate models only.Wraps Phaser and/or Molrep.Also uses a variety of helper applications (e.g. Chainsaw) and bioinformatics tools (e.g. Fasta, Mafft)Uses on-line databases (e.g. PDB, Scop)In favourable cases, gives “one-button” solutionIn unfavourable cases, will suggest likely search models for manual investigation (lead generation)
31Molecular Replacement The PipelineTarget MTZ&Sequence`TargetDetails`TemplateSearch`Check scores and exit or select the next modelModelPreparation`Molecular Replacement& Refinement
32Search for homologous proteins FASTA search of PDBSequence based search using sequence of target structure.Can be run locally if user has fasta34 program installed or remotely using the OCA web-based service hosted by the EBI.All of the resulting PDB id codes are added to a listThese structures are calledmodel templates
33Search for additional similar structures Additional structure-based search (optional)Top hit from the FASTA search is used as the template structure for a secondary structure based search.Uses the SSM webservice provided by the EBI (a.k.a. MSDfold)Any new structures found are added to the list.Provides structural variation, not based on direct sequence similarity to targetManual additionCan add additional PDB id codes to the list, e.g. from FFAS or psiBLAST searchesCan add local PDB files
34Multiple AlignmentAfter the set of PDB ids are collected in the FASTA and SSM searches, their coordinate-based sequences are collected and put through a multiple alignment with the target sequenceAims:Score template structures in a consistent manner, in order to prioritise them for subsequent stepsExtract pairwise alignment between template and target for use in Chainsaw step. Multiple alignment should give a better set of alignments than the original pair-wise FASTA alignments
35Multiple Alignment target model templates pairwise alignment Jalview Barton group, Dundeecurrently support ClustalW or MAFFT for multiple alignment
36Template Model Scoring Alignment Scoring:score = sequence identity X alignment qualitySequence identity:Ungapped sequence identity i.e. sequence identity of aligned target residuesAlignment quality:Dependent on the alignment length, the number of gaps created in the template alignment and the extent of each of these gaps.The penalties given for gaps and the size of the gaps is biased so that alignments that preserve domains of the structure rather than spreading the aligned residues out score higher.The top scoring models are then used for further processing
37DomainsSuitable templates for target domains may exist in isolation in PDB, or in combination with dissimilar domainsIn case of relative domain motion, may want to solve domains separately
38Domains search: Domains Top scoring templates from multiple alignment are tested to see if they contain any domains.Uses the SCOP database. This only lists domains that appear more than once in the PDB.The database is scanned to to see if domains exist for each of the PDBs in the list of templatesDomains are then extracted from the parent PDB structure file and added to the list of template models as additional search models for MR.
39Search for quaternary structures that may be used as search models. MultimersMultimer search:Search for quaternary structures that may be used as search models.Better signal-to-noise ratio than monomer, if assembly is correct for the target.Multimeric structures based on top templates are retrieved using the PQS service at the EBI, and added to the list of search modelsPQS will soon be replaced by the use of the PISA service at the EBI (Eugene Krissinel)1n5a SPLIT-ASU into 4 Oligomeric files of type TRIMERIC1n5b SPLIT-ASU into 2 Oligomeric files of type DIMERIC1n5c SYMMETRY-COMPLEX Oligomeric file of type DIMERIC1n5d SYMMETRY-COMPLEX Oligomeric file of type DIMERIC
40Search Model Preparation Search models prepared in four ways:PDBcliporiginal PDB with waters removed, hydrogens removed, most probable conformations for side chains selected and chain ID’s added if missing.MolrepMolrep contains a model preparation function which will align the template sequence with the target sequence and prune the non-conserved side chains accordingly.ChainsawCan be given any alignment between the target and template sequences.Non-conserved residues are pruned back to the gamma atom.PolyalanineCreated by excluding all of the side chain atoms beyond the CB atom using the Pdbset programAlso create an ensemble model for Phaser based on top 5 models
41 Molecular Replacement and Refinement final Rfree < 0.35 or The search models can be processed with Molrep or Phaser or both.The resulting models from molecular replacement are passed to Refmac for restrained refinement.The change in the Rfree value during refinement is used as rough estimate of how good the resulting model is.final Rfree < 0.35 orfinal Rfree < 0.5 and dropped by 20%“success”final Rfree < 0.48 orfinal Rfree < 0.52 and dropped by 5%“marginal”“failure”otherwiseMR scores and un-refined models available for later inspection.
42MrBUMP on compute clusters MrBUMP can take advantage of a compute cluster to farm out the Molecular Replacement jobs.Currently Sun Grid Engine enabled clusters are supported but support will be added for LSF and condor and any other types of queuing system if there is enough demand.All nodes terminate when one finds a solution
43Pre-release version of MrBUMP Pre-release made available in Jan 06Simple installationCurrently runs on Linux and OSX.Windows version almost ready.Comes with CCP4 GUI .Can also be run from the command line with keyword inputFirst citation in Obiero et al., Acta Cryst. (2006). F62,Regular updates (currently version 0.3.2)
44A few observations ...In difficult cases, success in MrBUMP may depend on particular template, chain and model preparation methodNevertheless, may get several putative solutionsEase of subsequent model re-building, model completion may depend on choice of solutionFirst solution or check everything?Expectation that quick solution required - in fact, most users seem happy to let MrBUMP run for long time (hours, days)Worth checking “failed” solutions!