Macromolecular structure refinement Garib N Murshudov York Structural Biology Laboratory Chemistry Department University of York
Contents Purpose of and considerations for refinement Prior information: Dictionary of ligands Prior information: B value – How to deal with them Conclusions and future developments
Purpose Optimal fit of the model to the experimental data while retaining its chemical integrity Estimation of errors for the refined parameters Improvement of phases to facilitate model building (automatic e.g. ARP/wARP or manual) Give deviation from chemistry and experiment to aid analysis of the model
Considerations Function to optimise –Should use experimental data –Should be able to handle chemical information Parameters –Depends on the stage of analysis –Depends on amount and quality of the experimental data Methods to optimise –Depends on stage of analysis: simulated annealing, tunneling, conjugate gradient, second order (normal matrix, information matrix, second derivatives) –Some methods can give error estimate as a by-product. Second order methods give error estimate.
Function Probabilistic view Chemical information – prior knowledge Fit to experiment - likelihood Total function - posterior View from physics Internal energy External energy Total energy = internal + external Gibbs distribution: Probability of the state of the system is: Bayes’s theorem: Probability of the system (x) given experiment(x 0 )
System describing treatment of the experiment Internal energy or Prior probability External energy or likelihood
Function: likelihood and prior Likelihood describes fit of model parameters into experiment. There are few papers describing various aspects. E.g. Murshudov, Vagin, (1997 ) Acta Cryst. D53, Pannu, Murshudov,, Read (1998 ) Acta Cryst D5, Prior: Should include our knowledge about chemistry, biology and physics of the system: Bond lengths, angles, B values, overall organisations Dodson
Chemical information: Two atoms ideal case Distance between atoms 1.3Å. B values 20 and 50 Thin lines – single atoms Bold line - sum of the two atoms P X
Chemical information: Phe at two different resolutions 2 Å and High mobility 0.88 Å
Monomer library ALA CYS PHE SER CYS THR Macromolecules are polymers. They consist of chemical units (monomers). Monomers link with each other and form polymers. When they make link they undergo some chemical reaction. Links between monomers must contain chemical modification also
Monomers and links ALASER ALA-SER All atoms Atom types Charges Bonds Angles Planes Torsions Chiral volumes All atoms Atom types Charges Bonds Angles Planes Torsions Chiral volumes Modifications of monomers: Change, add, delete atoms, atom types, angles, planes, torsions, chiral volumes Bond Angles Torsions Planes Chiral volumes
Schematic view of library organisation Monomers Modifications Links Modif. Monomers are independent units. Modification can act on them. Links can join two monomers. Links may have modification also
Dictionary: Plans Finish mutual test of Fei’s program and dictionary Improve values using CSD and quantum chemical calculations Input formats: SMILE, MDL MOLFILE More automation of links and modifications More chemical assumptions Better links to other web resources (e.g. sweet, disacharide data base, corina, prodrg, msd/ebi) More monomers and links??? Adding more knowledge like frequently occurring fragment, most probable rotamers etc
B values B values are important component of atomic models They model molecular mobility as well as errors in atoms Distribution of B values is important for proper maximum likelihood estimation If estimated accurately their analysis can give some insight into biology of the molecule Note: Protein data bank is very rich source of prior information. But one must be careful in extracting them
Modeling of B values: TLS TLS model of atomic B values assumes that they depend on position of atoms (as implemented in REFMAC): U = U ind + T + r x L x r T + r T x S – S T x r T = A(r) Effect of this on electron density: This linear equations must be solved to calculate electron density without TLS
B values: Intuition and Bayesian B values are variances of Gaussians B values cannot be negative!!!!! Larger mean B larger variation of B Inverse gamma is natural prior of variances (It is used in microarray data analysis and can be used in X-ray data processing) Assumption: B values of macromolecules have inverse Gamma distribution.
B distribution: Inverse gamma Inverse gamma distribution: We can assume that to some degree is constant for all proteins.
B distribution: Mean vs variance Values of sqrt( ) vs indices 5000 of proteins are included. Proteins are sorted according to resolution. average value of is around 7
B distribution: 500 higher than 1.5A resolution structures sqrt( ) vs indices for 400 structures.
B distribution: Theoretical and from PDB B values of four proteins after normalisation by standard deviation are pooled together. Remaining parameter of the IG is estimated using Maximum likelihood
One PDB: Not very good example Histogram of B values for one protein. Red – histogram of B values Blue – parameters fitted using these B values Black = 6.7 (average for all high resolution proteins)
Use of B distributions Restraints on individual B values. It will allow refinement of B values reliable at medium and low resolutions Better restraints on differences between B values of close atoms. Detection of outliers (low B value – potential metal, high B value – potentially wrong) For normalisation of structure factor For improved Maximum likelihood estimation For map improvement
Conclusion and future perspectives Dictionary of monomers and links have been developed and implemented B value distributions look like IG. Analysis of B value distribution for solvent is needed Future “Proper” B value restraints Global and local improvement of dictionary Restraints to external information (small fragments) Twin, psuedotranslational (etc) refinement Inversion of sparse and full (Fisher information) matrix to estimate reliability of the parmaters
Acknowledgements Alexey Vagin Andrey Lebedev Roberto Steiner Fei Long Dan Zhou Najida Begum Mark Dunning Gleb Bourinkov Alexander Popov YSBL research environment Users CCP4 Wellcome Trust, BBSRC, EU BIOXHIT project
And of course!!!!