Molecular Replacement in CCP4

Slides:



Advertisements
Similar presentations
Molecular Replacement
Advertisements

Search in electron density using Molrep
Continuous improvement of macromolecular crystal structures Tom Terwilliger (Los Alamos National Laboratory) DDD WG member ECM 2012: Diffraction Data Deposition.
CCP4 Molecular Graphics (CCP4MG)
Twinning etc Andrey Lebedev YSBL. Data prcessing Twinning test: 1) There is twinning 2) The true spacegroup is one of … 3) Find the true spacegroup at.
Medical Image Registration Kumar Rajamani. Registration Spatial transform that maps points from one image to corresponding points in another image.
Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
Tutorial Homology Modelling. A Brief Introduction to Homology Modeling.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
A Molecular Replacement Pipeline Garib Murshudov Chemistry Department, University of York 
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Structures.
Refinement with REFMAC
Protein Interfaces, Surfaces and Assemblies
MOLECULAR REPLACEMENT Basic approach Thoughtful approach Many many thanks to Airlie McCoy.
MODELLER hands-on Ben Webb, Sali Lab, UC San Francisco Maya Topf, Birkbeck College, London.
Peter J. Briggs, Liz Potterton *, Pryank Patel, Alun Ashton, Charles Ballard, Martyn Winn CLRC Daresbury Laboratory, Warrington, Cheshire WA4 4AD, UK *
28 th March 2007 MrBUMP – Automated Molecular Replacement Ronan Keegan, Martyn Winn CCP4, Daresbury Laboratory.
28 Mar 06Automation1 Overview of developments within CCP4 Generation 1 ccp4i tasks Generation 2 isolated scripts / web service Generation 3 integrated.
Modelling binding site with 3DLigandSite Mark Wass
Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK.
Authors Project Database Handler The project database handler dbCCP4i is a small server program that handles interactions between the job database and.
A Molecular Replacement Pipeline Garib Murshudov Chemistry Department, University of York 
A Molecular Replacement Pipeline Garib Murshudov Chemistry Department, University of York 
BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York.
1 PyMOL Evolutionary Trace Viewer 1.1 Lichtarge Lab Sept. 13, 2010.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
Data quality and model parameterisation Martyn Winn CCP4, Daresbury Laboratory, U.K. Prague, April 2009.
What’s New in SEER-H 7.3 The Galorath Team. New Features Currency and Exchange Rate (with Multi-Currency option) Addin Enhancement Local Quantity, Schedule.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,
Overview of MR in CCP4 II. Roadmap
Bulk Model Construction and Molecular Replacement in CCP4 Automation Ronan Keegan, Norman Stein, Martyn Winn.
R. Keegan 1, J. Bibby 3, C. Ballard 1, E. Krissinel 1, D. Waterman 1, A. Lebedev 1, M. Winn 2, D. Rigden 3 1 Research Complex at Harwell, STFC Rutherford.
MrBUMP – Molecular Replacement with Bulk Model Preparation Automated search model discovery and preparation for structure solution by molecular replacement.
Parallel Fine Sampling to Solve Large or Difficult Structures Manually exploring large parameter space to find right combination of parameters is time-
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006.
Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
SR Users Meeting 10-11th September 2003 CCP4 Release 5.0 Peter Briggs CCP4/CCLRC Daresbury Laboratory.
Fitting EM maps into X-ray Data Alexei Vagin York Structural Biology Laboratory University of York.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
SQL SERVER 2008 Installation Guide A Step by Step Guide Prepared by Hassan Tariq.
CCP4 Molecular Replacement Model Generation Create a CCP4i task for generating Molecular Replacement models. - Selecting suitable PDB entries, based on.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
HANDS-ON ConSurf! Web-Server: The ConSurf webserver.
Zach Miller Computer Sciences Department University of Wisconsin-Madison Supporting the Computation Needs.
Molecular Replacement
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Stony Brook Integrative Structural Biology Organization
Protein Structure Visualisation
Take a REST from manual searching: PDBe, programmatically
Getting the Most out of the PDBe
CCP4 6.1 and beyond: Tools for Macromolecular Crystallography
Tutorial for using Case It for bioinformatics analyses
BLAST.
Protein Structures.
Protein structure prediction.
Automated Molecular Replacement
MrBUMP: progress and plans
The temporary site to download BALBES:
The site to download BALBES:
Presentation transcript:

Molecular Replacement in CCP4 Martyn Winn CCP4 group, Daresbury Laboratory

Data analysis before MR Matthews coefficient Number copies in a.s.u. Native Patterson (translational NCS) B factor analysis Self RF (rotational NCS)

Data analysis before MR Interface to Sfcheck (currently in Validation&Deposition module) completeness, anisotropy, Wilson B, twinning check, pseudo-translation check

Finding search models Need a PDB file for a structurally similar protein. This usually means a homologous protein. Either you have one already?  Or you search the Protein Data Bank Search is based on sequence alignment between target protein and proteins in PDB. Several bioinformatics tools can help here: OCA, MSDlite, MSDtarget - all use FASTA www.ebi.ac.uk/msd psiBLAST - iterative searching www.ncbi.nlm.nih.gov/BLAST FFAS - profile-profile alignment ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl

Editing search models Don’t use a raw PDB file for Molecular Replacement unless it is very similar (e.g. same protein, different conditions, ligand, etc.) Edit it to: remove residues that don’t occur in the target remove side chain atoms that don’t occur in the target (these assume a know alignment from model to target) remove uncertain regions of model (check B factors, occupancies) remove flexible loops Note that we don’t add anything!! Homology modelling? Consider use of individual domains and multimers (see MrBUMP below)

Chainsaw Norman Stein, Daresbury Lab.

MR model preparation: chainsaw Molecular replacement model preparation utility that edits a PDB search model according to a sequence alignment. Features: Removes un-aligned residues from the model Prunes non-conserved residues back to the gamma atom Preserves more atoms than in polyalanine model Unmodified template Chainsaw template Polyalanine template Example of 1mr6 used as a template for 1tgx (38% sequence identity)

Running Chainsaw: complete PDB file model to target alignment Alignment from: original search tool (FASTA, psiBLAST, etc.) multiple alignment (set of search models, protein family, etc.) hand-created

Molrep Alexei Vagin, York http://www.ysbl.york.ac.uk/~alexei/molrep.html

Molrep: overview of functionality Performs complete MR in single step: Expt. data (MTZ) Positioned search model Molrep Search model (PDB) Individual steps for more difficult cases: CRF, TF, rigid-body Multi-copy search: locked CRF, dyad search Self RF Phased TF, spherically-averaged phased TF Improve search model Other search models: electron density map, NMR models Fit model in electron density map / EM map

MR for straightforward case via GUI: title mode MTZ file MTZ labels search model RUN IT!

|F|new = |F|input *exp(-Badd*s2)*(1-exp(-Boff*s2) Other parameters DEFAULTS ARE GOOD Low resolution cut-off Molrep uses soft cut-off, Boff (BOFF, COMPL, RESMIN) High resolution cut-off Molrep uses soft cut-off, Badd (BADD, SIM) |F|new = |F|input *exp(-Badd*s2)*(1-exp(-Boff*s2) Defaults estimated High resolution limit Absolute cut-off (RESMAX) Default estimated Radius of Patterson sphere for CRF Default is twice radius of gyration of search model, Keyword RAD, Infrequently Used Parameters in GUI

Cross Rotation Function Euler angles (CCP4) polar angles R factor List of top RF peaks More details here

Translation Function fractional translation R factor Score polar angles List of solutions: top TF for each RF solution contrast of solution

Identification of solutions SCORE = product Correlation Coefficient and maximal value of Packing Function Packing Function integrated into TF search  removes solutions with overlapping molecules CONTRAST = ratio of top score to mean score: >2.5 - definitely solution <2.5 and > 1.8 - solution <1.8 and > 1.5 - maybe solution <1.5 and > 1.3 - maybe not solution, but program accepts it <1.3 - probably not solution

Finding more than one copy in the asu By default, Molrep will estimate number of copies to find. Override with NMON keyword Program flow: CRF TF for first copy Fix first copy TF for second copy Fix second copy TF for third copy .

Solving complexes Choose first component (largest, highest similarity) Solve for first component (probably need to specify NMON explicitly) New Molrep job Model in - second component Fixed in - positioned first component Repeat for all other components Possibility to use spherically-averaged phased TF using phases from first component

Phaser Randy Read, Airlie McCoy, Cambridge Phaser website: http://www-structmed.cimr.cam.ac.uk/phaser/

Performs complete MR in single step: Expt. data (MTZ) Positioned search model Phaser Search model (PDB) Use “MODE MR_AUTO” or “automated search” in the GUI anisotropy correction fast rotation function fast translation function packing refinement and phasing loop over models

More functionality ... All steps can be run separately Search over spacegroups: MTZ spacegroup and enantiomorph All spacegroups in MTZ point-group Selected spacegroups Ensemble models (see later) Brute RF and TF - slow and accurate Normal mode analysis Generates perturbed models

MR for straightforward case via GUI: mode MTZ file target details search model specify search RUN IT!

FRF Euler angles (CCP4) Top LLG and Z-scores for FRF

FTF fractional translation FRF solution number Top LLG and Z-scores for FRF

Packing Phaser does packing check after FTF Clashes = C atoms closer than 2Å Default number of clashes = 0 Think about increasing to 2 or 5

Solution files: .sol file produced at end of job Contains summary of all solutions Each solution contains rotations and usually translations - 3DIM vs 6DIM One line per model located .sol file can be read back into Phaser in later jobs Z-score Have I solved it? less than 5 no 5 - 6 unlikely 6 - 7 possibly 7 - 8 probably more than 8 definitely RFZ = RF Z-score TFZ = TF Z-score

Phaser refers to search models as “ensembles” Ensemble models Phaser refers to search models as “ensembles” Often, ensemble contains single model, as in traditional MR But Phaser can use an ensemble of > 1 models, which may work better than any single model Models in an ensemble must be superposed prior to use in Phaser - use e.g. Superpose in CCP4 N.B. Phaser will complain if: MW of models in ensemble are too different RMS between models is too large (In Molrep, construct ensemble as pseudo-NMR PDB file)

Finding more than one copy in the asu Specify > 1 in Composition of the asymmetric unit (keyword COMPOSITION ... NUMBER) Specify > 1 in Number of copies to search for (keyword SEARCH ... NUMBER) Phaser will issue warnings if these numbers are wrong. CRF TF for first copy Fix first copy (possibly multiple sets) CRF for second opy TF for second copy Fix second copy (possibly multiple sets) .

Complexes As before, but: Define > 1 type of component Composition of the asymmetric unit Define another component Define > 1 ensemble Define ensembles Add ensemble Specify all searches Search details Add another search E.g. beta-blip example in Phaser tutorial: http://www-structmed.cimr.cam.ac.uk/phaser/tutorial/Phaser_MR_tute.html

MrBUMP Ronan Keegan, Martyn Winn, Daresbury Lab.

The aim of MrBUMP An automation framework for Molecular Replacement. Particular emphasis on generating a variety of search models. Can be used to generate models only. Wraps Phaser and/or Molrep. Also uses a variety of helper applications (e.g. Chainsaw) and bioinformatics tools (e.g. Fasta, Mafft) Uses on-line databases (e.g. PDB, Scop) In favourable cases, gives “one-button” solution In unfavourable cases, will suggest likely search models for manual investigation (lead generation)

Molecular Replacement The Pipeline Target MTZ & Sequence ` Target Details ` Template Search ` Check scores and exit or select the next model Model Preparation ` Molecular Replacement & Refinement

Search for homologous proteins FASTA search of PDB Sequence based search using sequence of target structure. Can be run locally if user has fasta34 program installed or remotely using the OCA web-based service hosted by the EBI. All of the resulting PDB id codes are added to a list These structures are called model templates

Search for additional similar structures Additional structure-based search (optional) Top hit from the FASTA search is used as the template structure for a secondary structure based search. Uses the SSM webservice provided by the EBI (a.k.a. MSDfold) Any new structures found are added to the list. Provides structural variation, not based on direct sequence similarity to target Manual addition Can add additional PDB id codes to the list, e.g. from FFAS or psiBLAST searches Can add local PDB files

Multiple Alignment After the set of PDB ids are collected in the FASTA and SSM searches, their coordinate-based sequences are collected and put through a multiple alignment with the target sequence Aims: Score template structures in a consistent manner, in order to prioritise them for subsequent steps Extract pairwise alignment between template and target for use in Chainsaw step. Multiple alignment should give a better set of alignments than the original pair-wise FASTA alignments

Multiple Alignment target model templates pairwise alignment Jalview 2.08.1 Barton group, Dundee currently support ClustalW or MAFFT for multiple alignment

Template Model Scoring Alignment Scoring: score = sequence identity X alignment quality Sequence identity: Ungapped sequence identity i.e. sequence identity of aligned target residues Alignment quality: Dependent on the alignment length, the number of gaps created in the template alignment and the extent of each of these gaps. The penalties given for gaps and the size of the gaps is biased so that alignments that preserve domains of the structure rather than spreading the aligned residues out score higher. The top scoring models are then used for further processing

Domains Suitable templates for target domains may exist in isolation in PDB, or in combination with dissimilar domains In case of relative domain motion, may want to solve domains separately

Domains search: Domains Top scoring templates from multiple alignment are tested to see if they contain any domains. Uses the SCOP database. This only lists domains that appear more than once in the PDB. The database is scanned to to see if domains exist for each of the PDBs in the list of templates Domains are then extracted from the parent PDB structure file and added to the list of template models as additional search models for MR.

Search for quaternary structures that may be used as search models. Multimers Multimer search: Search for quaternary structures that may be used as search models. Better signal-to-noise ratio than monomer, if assembly is correct for the target. Multimeric structures based on top templates are retrieved using the PQS service at the EBI, and added to the list of search models PQS will soon be replaced by the use of the PISA service at the EBI (Eugene Krissinel) 1n5a SPLIT-ASU into 4 Oligomeric files of type TRIMERIC 1n5b SPLIT-ASU into 2 Oligomeric files of type DIMERIC 1n5c SYMMETRY-COMPLEX Oligomeric file of type DIMERIC 1n5d SYMMETRY-COMPLEX Oligomeric file of type DIMERIC

Search Model Preparation Search models prepared in four ways: PDBclip original PDB with waters removed, hydrogens removed, most probable conformations for side chains selected and chain ID’s added if missing. Molrep Molrep contains a model preparation function which will align the template sequence with the target sequence and prune the non-conserved side chains accordingly. Chainsaw Can be given any alignment between the target and template sequences. Non-conserved residues are pruned back to the gamma atom. Polyalanine Created by excluding all of the side chain atoms beyond the CB atom using the Pdbset program Also create an ensemble model for Phaser based on top 5 models

   Molecular Replacement and Refinement final Rfree < 0.35 or The search models can be processed with Molrep or Phaser or both. The resulting models from molecular replacement are passed to Refmac for restrained refinement. The change in the Rfree value during refinement is used as rough estimate of how good the resulting model is. final Rfree < 0.35 or final Rfree < 0.5 and dropped by 20%  “success” final Rfree < 0.48 or final Rfree < 0.52 and dropped by 5%  “marginal”  “failure” otherwise MR scores and un-refined models available for later inspection.

MrBUMP on compute clusters MrBUMP can take advantage of a compute cluster to farm out the Molecular Replacement jobs. Currently Sun Grid Engine enabled clusters are supported but support will be added for LSF and condor and any other types of queuing system if there is enough demand. All nodes terminate when one finds a solution

Pre-release version of MrBUMP Pre-release made available in Jan 06 Simple installation Currently runs on Linux and OSX. Windows version almost ready. Comes with CCP4 GUI . Can also be run from the command line with keyword input First citation in Obiero et al., Acta Cryst. (2006). F62, 757-760 Regular updates (currently version 0.3.2) http://www.ccp4.ac.uk/MrBUMP

A few observations ... In difficult cases, success in MrBUMP may depend on particular template, chain and model preparation method Nevertheless, may get several putative solutions Ease of subsequent model re-building, model completion may depend on choice of solution First solution or check everything? Expectation that quick solution required - in fact, most users seem happy to let MrBUMP run for long time (hours, days) Worth checking “failed” solutions!