A Molecular Replacement Pipeline Garib Murshudov Chemistry Department, University of York 

Slides:



Advertisements
Similar presentations
© 2008 EBSCO Information Services SUSHI, COUNTER and ERM Systems An Update on Usage Standards Ressources électroniques dans les bibliothèques électroniques.
Advertisements

Molecular Replacement in CCP4
Molecular Replacement
Search in electron density using Molrep
Twinning etc Andrey Lebedev YSBL. Data prcessing Twinning test: 1) There is twinning 2) The true spacegroup is one of … 3) Find the true spacegroup at.
A 3-D reference frame can be uniquely defined by the ordered vertices of a non- degenerate triangle p1p1 p2p2 p3p3.
Recent developments 1) Tests (outlier analysis) and Bug fixing ( with Paul) 2) Regeneration of Values of Bonds and Bond-angles existing all structures.
System Design and Analysis
Macromolecular structure refinement Garib N Murshudov York Structural Biology Laboratory Chemistry Department University of York.
Testing an individual module
High Throughput Processing of the Structural Information of the Protein Data Bank Zoltán Szabadka, Vince Grolmusz Department of Computer Science Eötvös.
Refinement with REFMAC
Introduction to Systems Analysis and Design Trisha Cummings.
Protein Interfaces, Surfaces and Assemblies
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
MOLECULAR REPLACEMENT Basic approach Thoughtful approach Many many thanks to Airlie McCoy.
MODELLER hands-on Ben Webb, Sali Lab, UC San Francisco Maya Topf, Birkbeck College, London.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Peter J. Briggs, Liz Potterton *, Pryank Patel, Alun Ashton, Charles Ballard, Martyn Winn CLRC Daresbury Laboratory, Warrington, Cheshire WA4 4AD, UK *
Chapter 17 Domain Name System
28 th March 2007 MrBUMP – Automated Molecular Replacement Ronan Keegan, Martyn Winn CCP4, Daresbury Laboratory.
28 Mar 06Automation1 Overview of developments within CCP4 Generation 1 ccp4i tasks Generation 2 isolated scripts / web service Generation 3 integrated.
Molecular Replacement
A Molecular Replacement Pipeline Garib Murshudov Chemistry Department, University of York 
A Molecular Replacement Pipeline Garib Murshudov Chemistry Department, University of York 
BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York.
1 PyMOL Evolutionary Trace Viewer 1.1 Lichtarge Lab Sept. 13, 2010.
WSMX Execution Semantics Executable Software Specification Eyal Oren DERI
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Using CCP4 for PX Martin Noble, Oxford University and CCP4.
EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel
MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,
Overview of MR in CCP4 II. Roadmap
Bulk Model Construction and Molecular Replacement in CCP4 Automation Ronan Keegan, Norman Stein, Martyn Winn.
R. Keegan 1, J. Bibby 3, C. Ballard 1, E. Krissinel 1, D. Waterman 1, A. Lebedev 1, M. Winn 2, D. Rigden 3 1 Research Complex at Harwell, STFC Rutherford.
MrBUMP – Molecular Replacement with Bulk Model Preparation Automated search model discovery and preparation for structure solution by molecular replacement.
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006.
Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe
Configuring and Troubleshooting Identity and Access Solutions with Windows Server® 2008 Active Directory®
Ligand Building with ARP/wARP. Automated Model Building Given the native X-ray diffraction data and a phase-set To rapidly deliver a complete, accurate.
Direct Use of Phase Information in Refmac Abingdon, University of Leiden P. Skubák.
Worldwide Protein Data Bank Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable.
Fitting EM maps into X-ray Data Alexei Vagin York Structural Biology Laboratory University of York.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Software automation – What STAB sees as key aims? 1.Brief review of activities and recommendations (so far) 2.Reality checks 3. Things to do…
17 th October 2005CCP4 Database Meeting (York) CCP4i Database Overview Peter Briggs.
CCP4 Molecular Replacement Model Generation Create a CCP4i task for generating Molecular Replacement models. - Selecting suitable PDB entries, based on.
RAPPER Nick Furnham Blundell Group – Department of Biochemistry Cambridge University UK
SFCHECK Alexei Vagin YSBL, Chemistry Department, University of York.
Molecular Replacement
What does the future hold? SAPHIRE CCP4 libraries Program Developments More automation 3D viewer Project CCP4 Study Weekend 2003 BAR!
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Stony Brook Integrative Structural Biology Organization
CCP4 6.1 and beyond: Tools for Macromolecular Crystallography
Database Requirements for CCP4 17th October 2005
Complete automation in CCP4 What do we need and how to achieve it?
Progress Report in REFMAC
Version 5.3 From SMILE string to dictionary (LIBCHECK): Now coot uses it Segment id is now used Automatic adjustment for weights Improved bond order extraction.
Automated Molecular Replacement
MrBUMP: progress and plans
Garib Murshudov YSBL, Chemistry Department, University of York
The temporary site to download BALBES:
Zheng Liu, Fei Guo, Feng Wang, Tian-Cheng Li, Wen Jiang  Structure 
The site to download BALBES:
Auto molecular replacement
Database for MR.
Presentation transcript:

A Molecular Replacement Pipeline Garib Murshudov Chemistry Department, University of York 

Introduction Diagram showing the percentage of structures in the PDB solved by different techniques 67.5% of structures are solved by Molecular Replacement (MR) 21% of structures are solved by experimental phasing

Organisation of BALBES programsprograms datadata manager BALBES consists of three essential components

Manager It is written using PYTHON and relies on files of XML format for information exchange: 1.Data Resolution for molecular replacement Data completeness and other properties Twinning Pseudo translation 2.Sequence Finds template structures with their domain and multimer organisations Estimates number of molecules in the asymmetric unit “Corrects” template molecules using sequence alignment 3.Protocols Runs various protocols with molecular replacement and refinement and makes decisions accordingly manager

Manager makes decisions on different protocols according to search models available in the internal database. manager Flow chart

Database datadata Chains. The internal database has around unique entries selected from more than 45,000 present in the PDB. All entries in the PDB are analysed according to their identity. Only non- redundant sets of structures are stored. Domains. About 8000 of them have two or more domains. The DB contains domain definitions Loops and other flexible parts are removed from the domain definitions. Multimers of structures (using PISA) Hierarchy is organized according to sequence identity and 3D similarity (rmsd over Ca atoms).

Programs programsprograms MOLREP - molecular replacement Simple molecular replacement, phased rotation function (PRF), phased translation function (PTF), spherically averaged phased translation function (SAPTF), multi-copy search, search with fixed partial model REFMAC Maximum likelihood refinement, phased refinement, twin refinement, rigid body refinement, handling ligand dictionary, map coefficients SFCHECK Optical resolution, optimal resolution for molecular replacement, analysis of coordinates against electron density, twinning tests, pseudo translation Other programs: Alignment, search in DB, analysis of sequence and data to suggest number of expected monomers, semiautomatic domain definition

Molrep Molrep is highly automated molecular replacement program. With given data, pdb and sequence (if available) it tries to solve molecular replacement problems. It automatically decides amount of molecule you expect in the asymmetric unit It uses packing function to remove false solutions It does anisotropic correction when it is necessary before starting molecular rotation and translation search It can handle pseudo-translations It can search small fragment in the electron density It can do multi-copy search for cases when there are several domains or several subunits. It can do fit of Xray model to EM and EM to Xray using full point group constraints

Search models Input sequence Full chain models dom1chain1dom2chain1 dom1chain2dom2chain2 score Domains from DB dom1chain1 dom2chain1 dom1chain2 dom2chain2 dom1chain3 Best multi-domain model dom2chain1dom1chain3

Search models Input sequence Full chain models dom1chain1dom2chain1 dom1chain2dom2chain2 score Domains from DB dom1chain1 dom2chain1 dom1chain2 dom2chain2 dom1chain3 Best multi-domain model dom2chain1dom1chain3

Model preparation All models are corrected by sequence alignment and by atomic accessible surface area Multi-domain model Chain 1 multimer NO domains Chain 2 multimer domains

Heterogeneous Search Models I II III I+II+III I+II II+III Assembly of I+II+II Assembly of I+II Assembly Of II+III If a user provide several sequences, BALBES will search the database for complexes of models containing all or most of the sequences. I+III Assembly of I+III User’s sequences DB Search models

Search solution Assembly of I+II+II protocolsprotocols manager

Search solution

Example 1: 2dwr Homologues 2aen: monomer and one domain definition associated with it. Identity = 82% 1kqr: monomer, no domain definitions Identity = 45% 1z0m: dimer, no domain definitions Identity = 25% Derived search models (and their priority) (6) (3) (5) (1)(2) (4)

Example 2: 2chp Homologues 1ji5: trimer Identity = 50% 1jig: trimer Identity = 49% 1n1q: trimer Identity = 48% Derived search models (and their priority) (3)(1)(2)(4) (7)(5)(6)(8) (11)(9)(10)(12)

Example 3: 2gi7 Derived search models (and their priority) “Multi-domain” models: placing domains one by one and attempting to maintain proper composition of the asymmetric unit xxxx: contains domain 1 Identity = 42% yyyy: contains domain 2 Identity = 56% (8) 1ufu: monomer formed by two domains. Identity = 45% (5) (4) 1p7q: homo-dimer; each monomers is formed by two domains. Identity = 45% (3)(2) (1) Homologues dimericmonomeric“multi-domain” (7) (6) 2d3v: monomer formed by two domains. Identity = 46%

Example 4: assembly (two sequences are submitted) Derived search models (and their priority): 2b3t: hetero-dimer; monomers are formed by two and three domains. assemblymonomeric “multi-domain” Other homologues (1t43, 1nv8, 1zbt, 1rq0) are matching only one of two sequences. Priority rules applied to them are as in previous examples. Homologues structure: (1)(3)(2)(5)(4) Assembly models In case when two or more sequences are submitted attempt will be made to find hetero-oligomer matching all or some of these sequences. If found, such hetero-oligomers will be first models to try.

Example of search: Multi-domain protein PDB entry 1z45 has three major domains. One of the domains has also two subdomains. Domain 1 is similar to 1ek6 (seq id 55%). Domain 2 similar to 1yga (seq id 51%) and domain 3 is similar to 1udc (seq id 49%) 1z45 - isomerase 1ek6 - two domains of isomerase 1yga - another domain of isomerase 1udc - two domains of isomerase All these proteins are although isomerases they have slightly different activities This structure can be solved with multi-domain model.

Updating and Calibrating the System Only non- redundant structures are stored All structures newly deposited to the PDB are tested against the old internal database by using BALBES. Only after that the DB is updated. Updating and test are carried out every half a month. automatically generated domains are checked manually to make sure that automatic domain-definition transfer does not introduce errors.

The success rate of the latest tests (Jan - Feb 2008) Blue: the number of structures originally solved by a given method Magenta: the number of structures BALBES was able to solve Note: the fraction of structures solved by MR = 67% The success rate of our latest tests was more than 80% Note that some of the structures solved by experimental phasing could be actually solved by MR! Method All Methods MRSIR/MIRSAD/MADNot Specified 80.1% 91.3% 44.8% 85.5% N structures = 950

IdentityNumber of solved Number of unsolved Total found > < Among solved (3687) Twin possible1214 Pseudo translation281 Resolution < >3.523 Some new stats

Last weeks tests (based on the data released between 1 Sep-15 Sep) The number of pdb:224 The number of potential twins77 The number of twin with twin fraction more than 10%17 The number of pseudo translation20 The number solved with potential twin59(77) The number solved with pseudo translation11(20) Stats on results based on actual twins is not ready

Space group uncertainty When we added a new option: To check all potential space groups with full automatic molecular replacement. It was used more than we expected. We had to turn this option off temporarily. We have already updated the system and it is now available.

How to run BALBES: As an automated pipeline, BALBES tries to minimise users’ intervention. The only thing a user needs to do is to provide two input files (a structure factor and a sequence file) Running BALBES from the command line: balbes –f structure_factors_file -s sequence_file –o output_directory -f required -s required -o optional

BALBES CCP4i interface

BALBES Interface in Our Web Server (running using our Linux cluster) designed by P.Young

BALBES Interface in Our Web Server (running using our Linux cluster) designed by P.Young

BALBES Interface in Our Web Server (running using our Linux cluster) designed by P.Young

Preparation for next release. Generation of ensembles for domains 1jpt: Fab. 100 members2rgb: Oncoprotein. 20 members 3ckl: Sulfotranferase. 8 members 2qop: Transcription. 3 members

Conclusions 1.Internal database is an essential ingredient of efficient automation 2.With relatively simple protocols, BALBES is able to solve around 80% of structures automatically 3.Interplay of different protocols is very promising 4.Huge number of tests help to prioritise developments and generate ideas

Team (YSBL, York) Alexei Vagin Fei Long Paul Young Andrey Lebedev Garib Murshudov Acknowledgements E.Krissinel for PISA MSD/EBI, Cambridge All CCP4 and YSBL people for support ARP/wARP development team Wellcome Trust, BBSRC, EU BIOXHIT, NIH for support

The End The site to download BALBES: Webserver: This and other talks: