BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York.

Slides:



Advertisements
Similar presentations
Molecular Replacement in CCP4
Advertisements

Molecular Replacement
Search in electron density using Molrep
CCP4 Molecular Graphics (CCP4MG)
Twinning etc Andrey Lebedev YSBL. Data prcessing Twinning test: 1) There is twinning 2) The true spacegroup is one of … 3) Find the true spacegroup at.
Introduction to protein x-ray crystallography. Electromagnetic waves E- electromagnetic field strength A- amplitude  - angular velocity - frequency.
Recent developments 1) Tests (outlier analysis) and Bug fixing ( with Paul) 2) Regeneration of Values of Bonds and Bond-angles existing all structures.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Refinement of Macromolecular structures using REFMAC5 Garib N Murshudov York Structural Laboratory Chemistry Department University of York.
The TEXTAL System for Automated Model Building Thomas R. Ioerger Texas A&M University.
A Molecular Replacement Pipeline Garib Murshudov Chemistry Department, University of York 
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Macromolecular structure refinement Garib N Murshudov York Structural Biology Laboratory Chemistry Department University of York.
Protein structure prediction May 30, 2002 Quiz#4 on June 4 Learning objectives-Understand difference between primary secondary and tertiary structure.
Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
Refinement with REFMAC
Using survey data collection as a tool for improving the survey process Silvia Biffignandi, Antonio Laureti Giulio Perani University of Bergamo Istat Istat.
Protein Interfaces, Surfaces and Assemblies
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Peter J. Briggs, Liz Potterton *, Pryank Patel, Alun Ashton, Charles Ballard, Martyn Winn CLRC Daresbury Laboratory, Warrington, Cheshire WA4 4AD, UK *
28 th March 2007 MrBUMP – Automated Molecular Replacement Ronan Keegan, Martyn Winn CCP4, Daresbury Laboratory.
28 Mar 06Automation1 Overview of developments within CCP4 Generation 1 ccp4i tasks Generation 2 isolated scripts / web service Generation 3 integrated.
A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.
Authors Project Database Handler The project database handler dbCCP4i is a small server program that handles interactions between the job database and.
A Molecular Replacement Pipeline Garib Murshudov Chemistry Department, University of York 
A Molecular Replacement Pipeline Garib Murshudov Chemistry Department, University of York 
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Using CCP4 for PX Martin Noble, Oxford University and CCP4.
MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,
Overview of MR in CCP4 II. Roadmap
Hyper-heuristics. 2 Outline Hyper-heuristics Hyper-heuristics for strip packing Hyper-heuristics for Stock forecasting Conclusion.
R. Keegan 1, J. Bibby 3, C. Ballard 1, E. Krissinel 1, D. Waterman 1, A. Lebedev 1, M. Winn 2, D. Rigden 3 1 Research Complex at Harwell, STFC Rutherford.
1. Diffraction intensity 2. Patterson map Lecture
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Project Database Handler The Project Database Handler dbCCP4i is a brokering application that mediates interactions between the project database and an.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006.
SR Users Meeting 10-11th September 2003 CCP4 Release 5.0 Peter Briggs CCP4/CCLRC Daresbury Laboratory.
Atomic structure model
X-ray crystallography – an overview (based on Bernie Brown’s talk, Dept. of Chemistry, WFU) Protein is crystallized (sometimes low-gravity atmosphere is.
Fitting EM maps into X-ray Data Alexei Vagin York Structural Biology Laboratory University of York.
17 th October 2005CCP4 Database Meeting (York) CCP4i Database Overview Peter Briggs.
EMBL-EBI Representative sets and Clustering.. EMBL-EBI Representative sets A subset of data that provides a statistically valid sample set for the complete.
CCP4 Molecular Replacement Model Generation Create a CCP4i task for generating Molecular Replacement models. - Selecting suitable PDB entries, based on.
CCP4 Version The most recent version of the CCP4 suite is 4.1, which was released at the end of January 2001, with a minor patch release shortly.
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
SFCHECK Alexei Vagin YSBL, Chemistry Department, University of York.
Peter J. Briggs, Alun Ashton, Charles Ballard, Martyn Winn and Pryank Patel CCLRC Daresbury Laboratory, Warrington, Cheshire WA4 4AD, UK The CCP4 project.
Zach Miller Computer Sciences Department University of Wisconsin-Madison Supporting the Computation Needs.
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
Molecular Replacement
What does the future hold? SAPHIRE CCP4 libraries Program Developments More automation 3D viewer Project CCP4 Study Weekend 2003 BAR!
Project Database Handler The Project Database Handler is a brokering application which will mediate interactions between the project database and other.
Stony Brook Integrative Structural Biology Organization
CCP4 6.1 and beyond: Tools for Macromolecular Crystallography
Database Requirements for CCP4 17th October 2005
Complete automation in CCP4 What do we need and how to achieve it?
CCP4 from a user perspective
Version 5.3 From SMILE string to dictionary (LIBCHECK): Now coot uses it Segment id is now used Automatic adjustment for weights Improved bond order extraction.
Automated Molecular Replacement
MrBUMP: progress and plans
The temporary site to download BALBES:
The site to download BALBES:
Auto molecular replacement
Database for MR.
Presentation transcript:

BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York

Outline Introduction Database System manager Scientific programs Calibrating the System A Example Release and Development Plan

Introduction The number of entries in the Protein data bank (PDB) is increasing every year. It has many implications to Macromolecular crystallography. One challenge is how to use them efficiently in development of a structure solution software. Analysis of the PDB shows that this year around 67% of all the deposited structures reported to be solved by molecular replacement. With better algorithms and organisation of data bank it is expected that the above number can be substantially higher. Our system contains three main components, (1)reorganised database, (2) a manager written in PYTHON that makes decision and (3) scientific programs such as MOLREP and REFMAC

Database: Reorganisation of PDB All entries in the PDB have been analysed according to their homology and only non-redundant set of structures were stored. Hierarchical database was organized according to sequence identities If domains are present, information about them was stored Multimiers of a structure Fragments of various lengths (under way) Intensity curves for various types of macromolecules(later)

Database: (continue) A Database of portable size is created, which enables  fast search for similar structure (less than 10 seconds in a typical MAC G5 processor for most test cases so far)  all action performed locally (independent on internet)  provide required information of the similar structures(domains, tertiary structures)

System Manager It is written using PYTHON and relies on files of XML format for information exchange: 1.Data Twinning Pseudotranslation Resolution for molecular replacement Completeness and other properties 2.Sequence Finds template structures with their domain and multimeric organisations Finds number of molecules in the asymmetric unit “Corrects” template molecules using sequence alignment 3.Protocols Runs various protocols with molecular replacement and refinement and makes decisions accordingly

Scientific programs MOLREP - molecular replacement Simple molecular replacement, Phased rotation, translation functions, spherically averaged phased translation function, dyad search, search with one model fixed etc REFMAC Maximum likelihood refinement, phased refinement, rigid body refinement, extensive dictionary, map coefficients etc SFCHECK Twinning tests, psuedotranslation, optical resolution, optimal resolution for molecular replacement, analysis of coordinates against electron density etc Auxiliary programs: Alignment, search in DB, analysis of sequence and data to suggest number of expected monomers, removal of bits of structure from coordinates according to fit into electron density, semiautomatic domain definition etc

Calibrating the System Step 1: Making the database In the PDB there were more than 30,000 structures deposited up to end of 2004, but only ~10,000 were non-redundant. These 10,000 were used to construct our database of known structures. Step 2: Testing the system: ~1000 structures were deposited between Jan-May We tried to solve all of these with our automated approach. The success rate was ~75% with our current version. This is actually higher than the proportion reported as solved using MR!

Overall test results Reported in PDB Note that not all structures that were used as a search model are present in our DB OTHER MIR 50510SIR MAD SAD MR ALL Rate (%) Success Cases Case Number Method Test Case Statistics

All 100% Reported to be solved by MR 67% Solved automatically by our system - 75% Schematic view of the success rate of our system

Progress to date We are analysing all failed cases and have already significantly enhanced the system as a result. We have developed several new techniques by carefully analysing these results. Success is great for funding! Failure is great for future developments!

Example: Addition of domains Search with the whole molecule Is it solution ? Yes Refine and exit Are there domains? No Other protocols No Yes MR for each domain and find the best Refine and produce map Mask out found domain(s) Use SPTF, PRF, PTF to find missing domains Is it solution? No Other protocols Yes Is solution complete? Yes Refine and exit No

Example: Domain motions - 1tj3 Finding whole molecule was problematic. Finding the large domain refining and then using SPTF/PT/TF using masked map was straightforward

Conclusions 1.Database is an essential ingredient of efficient automation 2.With relatively simple protocols it will be possible to solve more than 80% of structure automatically 3.Interplay of different protocols is very promising 4.Huge number of tests help to prioritise developments and generate ideas

Development Plans Development currently under way and in immediate future: Update database by adding entries based on PDB files deposited in 2005 (Thanks Eugene for PISA, which we use for multimer analysis) Add multichain domain definitions Test the system against PDB files deposited in 2006 Target release date: May-June 2006 Combine with some protocols from experimental phasing and automatic model building (Foadi, Cowtan) Future: Combine with automatic model building Make decision during refinement about twinning and other properties Pass information about search templates to refinement Combine with experimental phasing Regular update

Acknowledgements All CCP4 and YSBL people Wellcome Trust, BBSRC, EU BIOXHIT, NIH for support