Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Automated phase improvement and model building with Parrot and Buccaneer Kevin Cowtan
Martyn Winn, STFC Daresbury Laboratory. 1. CCP4 as a suite 2. Overview of CCP4 functionality 3. Future directions.
M.I.R.(A.S.) S.M. Prince U.M.I.S.T.. The only generally applicable way of solving macromolecular crystal structure No reliance on homologous structure.
M.I.R.(A.S.) S.M. Prince U.M.I.S.T.. The only generally applicable way of solving macromolecular crystal structure No reliance on homologous structure.
TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS)
PROBABILISTIC ASSESSMENT OF THE QSAR APPLICATION DOMAIN Nina Jeliazkova 1, Joanna Jaworska 2 (1) IPP, Bulgarian Academy of Sciences, Sofia, Bulgaria (2)
Small Molecule Example – YLID Unit Cell Contents and Z Value
LHC Collimation Working Group – 19 December 2011 Modeling and Simulation of Beam Losses during Collimator Alignment (Preliminary Work) G. Valentino With.
Experimental Phasing stuff. Centric reflections |F P | |F PH | FHFH Isomorphous replacement F P + F H = F PH FPFP F PH FHFH.
Twinning in protein crystals NCI, Macromolecular Crystallography Laboratory, Synchrotron Radiation Research ANL Title Zbigniew Dauter.
Macromolecular structure refinement Garib N Murshudov York Structural Biology Laboratory Chemistry Department University of York.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Don't fffear the buccaneer Kevin Cowtan, York. ● Map simulation ⇨ A tool for building robust statistical methods ● 'Pirate' ⇨ A new statistical phase improvement.
Active Appearance Models Suppose we have a statistical appearance model –Trained from sets of examples How do we use it to interpret new images? Use an.
In honor of Professor B.C. Wang receiving the 2008 Patterson Award In honor of Professor B.C. Wang receiving the 2008 Patterson Award Direct Methods and.
TEXTAL Progress Basic modeling of side-chain and backbone coordinates seems to be working well. –even for experimental MAD maps, 2.5-3A –using pattern-recognition.
Phasing based on anomalous diffraction Zbigniew Dauter.
Refinement with REFMAC
The P HENIX project Crystallographic software for automated structure determination Computational Crystallography Initiative (LBNL) -Paul Adams, Ralf Grosse-Kunstleve,
28 Mar 06Automation1 Overview of developments within CCP4 Generation 1 ccp4i tasks Generation 2 isolated scripts / web service Generation 3 integrated.
Progress report on Crank: Experimental phasing Biophysical Structural Chemistry Leiden University, The Netherlands.
Kevin Cowtan, DevMeet CCP4 Wiki ccp4wiki.org Maintainer: YOU.
First Aid & Pathology Data quality assessment in PHENIX
BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York.
Data quality and model parameterisation Martyn Winn CCP4, Daresbury Laboratory, U.K. Prague, April 2009.
OASIS-2006 Institute of Physics Chinese Academy of Sciences Beijing , P.R. China Institute of Physics Chinese Academy of Sciences Beijing ,
Jeroen Pannekoek - Statistics Netherlands Work Session on Statistical Data Editing Oslo, Norway, 24 September 2012 Topic (I) Selective and macro editing.
A stochastic approach to Molecular Replacement. Nicholas M. Glykos & Michael Kokkinidis IMBB, FORTH, Heraklion, Crete, GREECE.
SAD phasing by OASIS-2004 SAD phasing by OASIS-2004.
Using CCP4 for PX Martin Noble, Oxford University and CCP4.
International Conference on Structural Genomics 2006 Hands-on Workshop III International Conference on Structural Genomics 2006 Hands-on Workshop III Automated.
Overview of MR in CCP4 II. Roadmap
Crank and Databases Steven Ness Leiden University The Netherlands.
Phasing Today’s goal is to calculate phases (  p ) for proteinase K using PCMBS and EuCl 3 (MIRAS method). What experimental data do we need? 1) from.
1. Diffraction intensity 2. Patterson map Lecture
Zhang, T., He, Y., Wang, J.W., Wu, L.J., Zheng, C.D., Hao, Q., Gu, Y.X. and Fan, H.F. (2012) Institute of Physics, Chinese Academy of Sciences Beijing,
POINTLESS & SCALA Phil Evans. POINTLESS What does it do? 1. Determination of Laue group & space group from unmerged data i. Finds highest symmetry lattice.
OASIS What is it? How it works? What next? What is it? How it works? What next?
Methods in Chemistry III – Part 1 Modul M.Che.1101 WS 2010/11 – 8 Modern Methods of Inorganic Chemistry Mi 10:15-12:00, Hörsaal II George Sheldrick
3. Spot Finding 7(i). 2D Integration 2. Image Handling 7(ii). 3D Integration 4. Indexing 8. Results 1. Introduction5. Refinement Background mask and plane.
Computational Crystallography InitiativePhysical Biosciences Division First Aid & Pathology Data quality assessment in PHENIX Peter Zwart.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Direct Use of Phase Information in Refmac Abingdon, University of Leiden P. Skubák.
Progress report on Crank: Model building Biophysical Structural Chemistry Leiden University, The Netherlands.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Bethesda, March 4 th 2009 Semi-automatic structure solution with HKL-3000 Structural Biology.
Phasing in Macromolecular Crystallography
Today: compute the experimental electron density map of proteinase K Fourier synthesis  (xyz)=  |F hkl | cos2  (hx+ky+lz -  hkl ) hkl.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Prediction of Interconnect Net-Degree Distribution Based on Rent’s Rule Tao Wan and Malgorzata Chrzanowska- Jeske Department of Electrical and Computer.
Kevin Cowtan, Automated phase improvement and model building Kevin Cowtan
Score maps improve clarity of density maps
OASIS-2004 A direct-method program for
Solving Crystal Structures
Advanced Wireless Networks
CCP4 6.1 and beyond: Tools for Macromolecular Crystallography
Complete automation in CCP4 What do we need and how to achieve it?
Efficient Estimation of Residual Trajectory Deviations from SAR data
Phasing Today’s goal is to calculate phases (ap) for proteinase K using MIRAS method (PCMBS and GdCl3). What experimental data do we need? 1) from native.
CCP4 from a user perspective
Centar ( Global Signal Processing Expo
Experimental phasing in Crank2 Pavol Skubak and Navraj Pannu Biophysical Structural Chemistry, Leiden University, The Netherlands
Volume 24, Issue 8, Pages (August 2016)
Provide quick feedback to data collection experiments.
Automated Molecular Replacement
Presentation transcript:

Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Biophysical Structural Chemistry, Leiden University, The Netherlands

Low resolution and/or weak anomalous signal SAD data sets With a sufficient anomalous signal and resolution better than 3 Angstroms, your structure is likely to be automatically built. What can be done if your SAD data is weak?

Simultaneously combining experimental phasing steps to improve structure solution Traditionally structure solution is divided into distinct steps: Substructure detection Obtain initial phases (Density) modify the initial experimental map Build and refine the model By combining these steps, we can improve the process.

Phases, phase probabilities Experimental data, anomalous substructure Model Experimental phasing Density modification, phase combination Density modification, phase combination Model building and refinement Traditional approach Phases, phase probabilities Traditional structure solution Step-wise Information is propagated via ‘phase probabilities’

Phase probabilities The phase distribution can be approximated via 4 “Hendrickson-Lattmann” coefficients, A, B, C, D. We rely on programs to estimate these coefficients. ‘Even better than the real thing?’

Phase probabilities How do we choose the phase to use for our electron density map? The “best phase” corresponds to mean/average/expected value (Blow and Crick). What are ways to determine how good and accurate your phases are? FOM: figure of merit: mean cosine of the phase error (a value of 1 means phases are perfect, a value of 0 means phases is the same as random).

Density modification ● Density modification is a problem of combining information:

Kevin Cowtan, Density modification 2. Phase weighting: |F|, φ ρ mod (x) ρ(x) |F mod |, φ mod φ=f(φ exp,φ mod ) FFT FFT -1 Modify ρ Real spaceReciprocal space

HL propagation and the independence assumption Use the experimental data and anomalous substructure directly! Do not need to assume independence or rely on HL coefficients. Need multivariate distributions at each step that take into account correlations between the model and data.

Phases Experimental data, anomalous substructure Model Experimental phasing Density modification, phase combination Density modification, phase combination Model building and refinement Traditional approach Step-wise multivariate structure solution Still step-wise Information is propagated via the data and model(s). Phases

Experimental data, anomalous substructure Density modification Model building Electron density Model Combined experimental phasing, phase combination and model refinement Combined structure solution Simultaneously use information from experimental phasing, density modification and model refinement

Tests of > 140 real SAD data sets Resolution range of data sets is 0.94 to 3.88 Angstroms Types of anomalous scatterers: selenium, sulfur, chloride, iodide, bromide, calcium, zinc (and others). We compare with the step wise multivariate approach (current CRANK) versus the combined approach.

Model building results on over 140 real SAD data sets (using parrot and buccaneer)

Summary of large scale test The average fraction of the model built increased from 60% to 74% with the new approach. If we exclude data sets built to 85% by the current approach or where the substructure was not found, 45 data sets remain and the average fraction of the model built increased from 28 to 77%.

3.88 Angstrom RNA polymerase II 3.88 Angstrom SAD data with signal from zinc. Authors could not solve the structure with SAD data alone, but with a partial model, multi-crystal MAD and manual building. > 80% can be built with SAD data alone with the new algorithm automatically to an R-free of 37.6%

MR-SAD tests 4.5 Angstrom ATPase SecA-SecY complex 4.5 Angstrom data set with weak anomalous signal from Se-Met SecY. Authors could not solve the structure with SAD data alone, but used a partial MR structure, (2-fold NCS averaging), cross-crystal averaging, and manual model building > 60% can be built automatically starting just from selenium positions (obtained from partial MR solution) R-free obtained was under 40%. If we start with the partial MR structure, results are worse!

Related RNA polymerases complex from Cramer et al. 3.3 Angstrom data with signal from zinc. Could not solve the structure with anomalous data alone. With the new method, a majority can be built automatically in minutes.

4.6 Angstrom SKI2-3-8 complex 4.6 Angstrom data set Se-Met data set. > 50% can be built automatically starting just from selenium positions. R-free obtained was under 40%. If we start with the partial MR structure, results are again worse! (MR models are higher resolution and fit ‘fairly’ well to the final model).

Future work on MR-SAD Are we throwing away useful information or is it biased? At the moment, if a (partial) MR solution is available, it is best to run two crank2 jobs: Input the whole MR solution Input just the heavy atoms

Crank1 vs Crank2 Crank is suitable for S/MAD and S/MIRAS experiments and implements a stepwise multivariate function. Crank2 is its replacement that implements a combined multivariate function for SAD only. Both are available in CCP at the moment.

Programs in Crank

The number of cycles run. The number of atoms to search for. –Should be within 10-20% of actual number –A first guess uses a probabilistic Matthew’s coefficient The resolution cut-off: –For MAD, look at signed anomalous difference correlation. –For SAD, a first guess is high resolution limit. Important parameters in substructure detection

Is my map good enough? Statistics from substructure phasing: –Look at FOM from BP3. –For SAD, look at Luzzati parameters. –Refined occupancies. Statistics from density modification: –Compare the “contrast” from hand and enantiomorph (output of solomon or shelxe). Does it look like a protein? (model visualization) For Crank2, look to see if R-comb < 40%.

References Crank Ness et al (2004) Structure 12, Pannu et al (2011) Acta Cryst D67, Combined approach and Crank2 Skubak and Pannu (2013) Nature Communications 4: Using data directly in refinement Skubak et al (2004) Acta Cryst D60, Skubak et al (2009) Acta Cryst D65, Multivariate phase combination Waterreus et al (2010) Acta Cryst D66,

Acknowledgements All dataset contributors (JCSG, Z. Dauter, M.Weiss, C.Mueller-Dieckmann) Garib Murshudov, Kevin Cowtan, George Sheldrick, Victor Lamzin Cyttron