Macromolecular structure refinement Garib N Murshudov York Structural Biology Laboratory Chemistry Department University of York.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Introduction to Chromatography
Twinning etc Andrey Lebedev YSBL. Data prcessing Twinning test: 1) There is twinning 2) The true spacegroup is one of … 3) Find the true spacegroup at.
Automated phase improvement and model building with Parrot and Buccaneer Kevin Cowtan
Estimation, Variation and Uncertainty Simon French
Refinement Garib N Murshudov MRC-LMB Cambridge 1.
Chapter 4: Linear Models for Classification
Recent developments 1) Tests (outlier analysis) and Bug fixing ( with Paul) 2) Regeneration of Values of Bonds and Bond-angles existing all structures.
Two cases of chemometrics application in protein crystallography European Molecular Biology Laboratory (EMBL), Hamburg, Germany Andrey Bogomolov.
Active Calibration of Cameras: Theory and Implementation Anup Basu Sung Huh CPSC 643 Individual Presentation II March 4 th,
Refinement of Macromolecular structures using REFMAC5 Garib N Murshudov York Structural Laboratory Chemistry Department University of York.
Bayesian estimation Bayes’s theorem: prior, likelihood, posterior
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Maximum likelihood (ML)
A Molecular Replacement Pipeline Garib Murshudov Chemistry Department, University of York 
Don't fffear the buccaneer Kevin Cowtan, York. ● Map simulation ⇨ A tool for building robust statistical methods ● 'Pirate' ⇨ A new statistical phase improvement.
Computer vision: models, learning and inference
Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak.
Maximum likelihood (ML)
Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.
Refinement with REFMAC
Computational Chemistry. Overview What is Computational Chemistry? How does it work? Why is it useful? What are its limits? Types of Computational Chemistry.
Probability distribution functions
Process modelling and optimization aid FONTEIX Christian Professor of Chemical Engineering Polytechnical National Institute of Lorraine Chemical Engineering.
02/03/10 CSCE 769 Dihedral Angles Homayoun Valafar Department of Computer Science and Engineering, USC.
Soft Sensor for Faulty Measurements Detection and Reconstruction in Urban Traffic Department of Adaptive systems, Institute of Information Theory and Automation,
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
Kinetics and Thermodynamics of Simple Chemical Processes 2-1 Chemical thermodynamics: Is concerned with the extent that a reaction goes to completion.
A Molecular Replacement Pipeline Garib Murshudov Chemistry Department, University of York 
BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York.
Data quality and model parameterisation Martyn Winn CCP4, Daresbury Laboratory, U.K. Prague, April 2009.
Coot Tools for Model Building and Validation
Stochastic Monte Carlo methods for non-linear statistical inverse problems Benjamin R. Herman Department of Electrical Engineering City College of New.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Mean Field Variational Bayesian Data Assimilation EGU 2012, Vienna Michail Vrettas 1, Dan Cornford 1, Manfred Opper 2 1 NCRG, Computer Science, Aston University,
Forward-Scan Sonar Tomographic Reconstruction PHD Filter Multiple Target Tracking Bayesian Multiple Target Tracking in Forward Scan Sonar.
Ligand fitting and Validation with Coot Bernhard Lohkamp Karolinska Institute June 2009 Chicago (Paul Emsley) (University of Oxford)
This material is approved for public release. Distribution is limited by the Software Engineering Institute to attendees. Sponsored by the U.S. Department.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Ligand Building with ARP/wARP. Automated Model Building Given the native X-ray diffraction data and a phase-set To rapidly deliver a complete, accurate.
Direct Use of Phase Information in Refmac Abingdon, University of Leiden P. Skubák.
1 Estimation of Gene-Specific Variance 2/17/2011 Copyright © 2011 Dan Nettleton.
Machine Learning 5. Parametric Methods.
Theory of dilute electrolyte solutions and ionized gases
--Experimental determinations of radial distribution functions --Potential of Mean Force 1.
Review of statistical modeling and probability theory Alan Moses ML4bio.
MultiModality Registration Using Hilbert-Schmidt Estimators By: Srinivas Peddi Computer Integrated Surgery II April 6 th, 2001.
Regularization of energy-based representations Minimize total energy E p (u) + (1- )E d (u,d) E p (u) : Stabilizing function - a smoothness constraint.
CCP4 Version The most recent version of the CCP4 suite is 4.1, which was released at the end of January 2001, with a minor patch release shortly.
SFCHECK Alexei Vagin YSBL, Chemistry Department, University of York.
Automated Refinement (distinct from manual building) Two TERMS: E total = E data ( w data ) + E stereochemistry E data describes the difference between.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Probability Theory and Parameter Estimation I
Bayesian estimation Bayes’s theorem: prior, likelihood, posterior
Outlier Processing via L1-Principal Subspaces
CCP4 6.1 and beyond: Tools for Macromolecular Crystallography
Complete automation in CCP4 What do we need and how to achieve it?
Reduce the need for human intervention in protein model building
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Progress Report in REFMAC
Axel T Brünger, Paul D Adams, Luke M Rice  Structure 
Version 5.3 From SMILE string to dictionary (LIBCHECK): Now coot uses it Segment id is now used Automatic adjustment for weights Improved bond order extraction.
Garib Murshudov YSBL, Chemistry Department, University of York
The temporary site to download BALBES:
Combining Efficient Conformational Sampling with a Deformable Elastic Network Model Facilitates Structure Refinement at Low Resolution  Gunnar F. Schröder,
Regression and Correlation of Data
The site to download BALBES:
Presentation transcript:

Macromolecular structure refinement Garib N Murshudov York Structural Biology Laboratory Chemistry Department University of York

Contents Purpose of and considerations for refinement Prior information: Dictionary of ligands Prior information: B value – How to deal with them Conclusions and future developments

Purpose Optimal fit of the model to the experimental data while retaining its chemical integrity Estimation of errors for the refined parameters Improvement of phases to facilitate model building (automatic e.g. ARP/wARP or manual) Give deviation from chemistry and experiment to aid analysis of the model

Considerations Function to optimise –Should use experimental data –Should be able to handle chemical information Parameters –Depends on the stage of analysis –Depends on amount and quality of the experimental data Methods to optimise –Depends on stage of analysis: simulated annealing, tunneling, conjugate gradient, second order (normal matrix, information matrix, second derivatives) –Some methods can give error estimate as a by-product. Second order methods give error estimate.

Function Probabilistic view Chemical information – prior knowledge Fit to experiment - likelihood Total function - posterior View from physics Internal energy External energy Total energy = internal + external Gibbs distribution: Probability of the state of the system is: Bayes’s theorem: Probability of the system (x) given experiment(x 0 )

System describing treatment of the experiment Internal energy or Prior probability External energy or likelihood

Function: likelihood and prior Likelihood describes fit of model parameters into experiment. There are few papers describing various aspects. E.g. Murshudov, Vagin, (1997 ) Acta Cryst. D53, Pannu, Murshudov,, Read (1998 ) Acta Cryst D5, Prior: Should include our knowledge about chemistry, biology and physics of the system: Bond lengths, angles, B values, overall organisations Dodson

Chemical information: Two atoms ideal case Distance between atoms 1.3Å. B values 20 and 50 Thin lines – single atoms Bold line - sum of the two atoms P X

Chemical information: Phe at two different resolutions 2 Å and High mobility 0.88 Å

Monomer library ALA CYS PHE SER CYS THR Macromolecules are polymers. They consist of chemical units (monomers). Monomers link with each other and form polymers. When they make link they undergo some chemical reaction. Links between monomers must contain chemical modification also

Monomers and links ALASER ALA-SER All atoms Atom types Charges Bonds Angles Planes Torsions Chiral volumes All atoms Atom types Charges Bonds Angles Planes Torsions Chiral volumes Modifications of monomers: Change, add, delete atoms, atom types, angles, planes, torsions, chiral volumes Bond Angles Torsions Planes Chiral volumes

Schematic view of library organisation Monomers Modifications Links Modif. Monomers are independent units. Modification can act on them. Links can join two monomers. Links may have modification also

Dictionary: Plans Finish mutual test of Fei’s program and dictionary Improve values using CSD and quantum chemical calculations Input formats: SMILE, MDL MOLFILE More automation of links and modifications More chemical assumptions Better links to other web resources (e.g. sweet, disacharide data base, corina, prodrg, msd/ebi) More monomers and links??? Adding more knowledge like frequently occurring fragment, most probable rotamers etc

B values B values are important component of atomic models They model molecular mobility as well as errors in atoms Distribution of B values is important for proper maximum likelihood estimation If estimated accurately their analysis can give some insight into biology of the molecule Note: Protein data bank is very rich source of prior information. But one must be careful in extracting them

Modeling of B values: TLS TLS model of atomic B values assumes that they depend on position of atoms (as implemented in REFMAC): U = U ind + T + r x L x r T + r T x S – S T x r T = A(r) Effect of this on electron density: This linear equations must be solved to calculate electron density without TLS

B values: Intuition and Bayesian B values are variances of Gaussians B values cannot be negative!!!!! Larger mean B larger variation of B Inverse gamma is natural prior of variances (It is used in microarray data analysis and can be used in X-ray data processing) Assumption: B values of macromolecules have inverse Gamma distribution.

B distribution: Inverse gamma Inverse gamma distribution: We can assume that to some degree  is constant for all proteins.

B distribution: Mean vs variance Values of sqrt(  ) vs indices 5000 of proteins are included. Proteins are sorted according to resolution. average value of  is around 7

B distribution: 500 higher than 1.5A resolution structures sqrt(  ) vs indices for 400 structures.

B distribution: Theoretical and from PDB B values of four proteins after normalisation by standard deviation are pooled together. Remaining parameter of the IG is estimated using Maximum likelihood

One PDB: Not very good example Histogram of B values for one protein. Red – histogram of B values Blue – parameters fitted using these B values Black  = 6.7 (average for all high resolution proteins)

Use of B distributions Restraints on individual B values. It will allow refinement of B values reliable at medium and low resolutions Better restraints on differences between B values of close atoms. Detection of outliers (low B value – potential metal, high B value – potentially wrong) For normalisation of structure factor For improved Maximum likelihood estimation For map improvement

Conclusion and future perspectives Dictionary of monomers and links have been developed and implemented B value distributions look like IG. Analysis of B value distribution for solvent is needed Future “Proper” B value restraints Global and local improvement of dictionary Restraints to external information (small fragments) Twin, psuedotranslational (etc) refinement Inversion of sparse and full (Fisher information) matrix to estimate reliability of the parmaters

Acknowledgements Alexey Vagin Andrey Lebedev Roberto Steiner Fei Long Dan Zhou Najida Begum Mark Dunning Gleb Bourinkov Alexander Popov YSBL research environment Users CCP4 Wellcome Trust, BBSRC, EU BIOXHIT project

And of course!!!!