Download presentation
Presentation is loading. Please wait.
1
Refinement against cryo-EM maps
Garib Murshudov MRC-LMB, Cambridge, UK
2
Contents Importance of phases
Fit into EM maps: likelihood and restraints Overfitting Effect of oversharpening Form factors Brown et al, (2015) Tools for macromolecular model building and refinement into electron cryo-microscopy reconstructions. Acta Cryst. D71, Murshudov (2016) Refinement of Atomic Structures Against cryo-EM Maps, Methods in Enzymology. Ed. Crowter, V579, Nicholls et al (2018) Current approaches for the fitting and refinement of atomic models into cryo-EM maps using CCP-EM . Acta Cryst. D74,
3
Useful websites (related to us and related to atomic model building and refinement) ccpem: ccpem website. In future all our developments about EM will be distributed on this website. They also develop GUI for EM tools. coot: Coot website by Paul Emsley. All coot related tools are available from this website refmac: You can find some useful info on this website. Some of the tools are still available from this website. Ccp4 is the better site for refmac related info.
4
Importance of phases
5
Importance of phases It is well known that phases are very important. It is one of the examples demonstrating it. It may be not very good example but nevertheless … Can we show in terms of numbers the meaning of this statement?
6
Importance of phases Correlation between two densities: Correlation can be calculated in real or Fourier space. Advantage of Fourier space is that we can calculate it in Fourier shells or resolution bins. <F> = 0 in all resolution bins apart from that containing the origin. We almost never need the origin. Working in Fourier space means that we can analyse the quality of maps depending on resolution (or frequency). We also will use this fact: var(F) = <|F|2> So correlation in resolution bin for Fourier coefficients (complex coefficients) is
7
Importance of phases Using definition of E values:
we can write an expression for Fourier Shell Correlation (which is same as map vs map correlation if all Fourier components are used) Now let us consider several cases: Phases are perfect, amplitudes are perfect then cor=1.0 Phases are random, amplitudes are perfect then cor=0
8
Importance of phases Phases are perfect, amplitudes are random (i.e. not related to the structure we are solving) Now if we use the properties of the distribution of |F| then we get: i.e. correlation becomes: If phases are perfect then correlation between true and observed electron densities with random amplitudes can be as high as If we do know that amplitudes are random then we can replace them with constant value equal to the the expected value of the amplitudes on this resolution bin. E value of a constant equal to 1. So in this case correlation will be: One obvious conclusion is that if we have phase information then unobserved E values can be replaced with the expected value. It is how “free-lunch” algorithm works
9
Importance of phases: amplitudes also matter
Red – cos(Δφ) Cyan – corr(|F1|,|F2| If we take phases and replace amplitudes with random numbers then map correlations is reduced. This reduction is difference between black diagonal line and red line cor(F1,F2): complex
10
REFMAC
11
About REFMAC Refmac is a program for refinement of atomic models into experimental data It was originally designed for Macromolecular Crystallography It is based on some elements of Bayesian statistics: it tries to fit chemically and structurally consistent atomic models into the data. It also can fit atomic models into cryo-EM maps. It can do some manipulation of maps, e.g. sharpening/blurring It is available from CCP4 and CCPEM
12
Current approach to fitting into cryo-EM map
Mask the map around molecule to be refined If there are more than one maps then use them to generated “composite” map Calculate structure factors from masked map Generate reference structure restraints Generate RNA/DNA related restraints Use local symmetry (helical, icosahedral or others) Refine coordinates using masked map Monitor refinement behaviour using average FSC Put the molecule into the original map coordinate frame This procedure works but it is not entirely satisfactory. At the moment we do not use error model of “observed” data, we do not calculate difference maps, we do not calculate “improved” maps. At the moment sharpened map should be used for masking When there are multiple reconstruction then TLS type of refinement might be useful Local symmetry can be used as constraints or as non-bonding interactions
13
+ = Data Atomic model Fit and refine
We want to fit currently available model into the data and calculate differences between them. To do this fit properly we must use as much as possible information about model and data.
14
Likelihood function for cryo-EM maps
Probability distribution of “observed” structure factors given coordinates of a molecule: Fo “observed” structure factors calculated from EM map Fc calculated structure factors – from coordinates variance of signal variance of noise N normalisation Major difference from crystallography: 1) variance of noise is large at high resolution; 2) complex structure factors are available It is derived using joint probability distribution of “observed” and “true” structure factors and then integrating over “true” structure factors. It has not been implemented fully yet Ideally coordinate model should be refined against direct observations. However current approach seems to be satisfactory Correlation between map with random amplitudes and random phases.
15
Restraints: prior knowledge
There are several restraints types: Standard bond lengths, angles etc. Restraints against high resolution homologues Jelly body restraints Secondary structure restraints Covalent links Base pairs and parallel planes B value restraints Rob will give more details about this
16
Overfitting
17
Overfitting What leads to overfitting?
Insufficient data (low resolution, partial occupancy) Ignoring data (cutting by resolution) Sub-optimal parameterisation Bad weights Excess of imagination Mention Rfree – cite Frank paper 2003
18
Validation of overfitting
FSCs should overlap Smooth transition beyond resolution cutoff applied during refinement
19
Reference map sharpening
Cross validation requires maps to be on the same sharpening level. Refmac can sharpen given map to a reference map. Usually reference map is taken as full reconstruction map. Fourier transformation (structure factor) from reference map is calculated and then average values of modules of structure factors in resolution shells are calculated. For given map average values of modules of Fourier coefficients in resolution bins scaled to reference maps coefficients FSCs should overlap Smooth transition beyond resolution cutoff applied during refinement
20
Validation of overfitting
Not overfitted Overfitted FSCs should overlap Smooth transition beyond resolution cutoff applied during refinement FSCwork = model refined against half map 1, compared to half map 1 FSCfree = model refined against half map 1, compared to half map 2
21
Monitoring fit to density: FSCaverage
Measure of fit to density (analogous to crystallographic Rfactor) Used to follow the progress of refinement FSCaverage avoids a dependence on weight FSC is calculated over resolution shells If shells are sufficiently narrow the weights are roughly the same within each shell Agreement between the observed and predicted structure factors
22
FSC between model and map
As a basic rule one can use relationship between half data fsc and map correlation between observed and calculated maps to see how far fitting into the map can go, or if there is an overfitting. Fo – Fourier coefficients from observed maps Fc – Fourier coefficients from calculated maps FT – Fourier coefficients from “true” maps fsc – half data Fourier shell correlation Note that these formulas are valid under certain conditions: noise is random, independent of signal, two halves are independent,
23
FSC between model and map
If the model is perfect then cor(FT, Fc) = 1 and then relationship becomes even simpler When fsc = then cor(Fo,Fc) = 0.5 Therefore it is recommended to use fsc = as an indicator of resolution. In this case signal and noise variances are equal Rangana will talk more about correlations Correlation between observed and calculated map should be below the red curve. If it is higher than red curve then model has been over-fitted
24
Meaning of average FSC Average fsc could be interpreted as: 4𝜋 0 𝑠 𝑚𝑎𝑥 𝑓𝑠𝑐 𝑠 𝑠 2 𝑑𝑠 /4𝜋 0 𝑠 𝑚𝑎𝑥 𝑠 2 𝑑𝑠= 3 𝑠 𝑚𝑎𝑥 3 0 𝑠 𝑚𝑎𝑥 𝑓𝑠𝑐 𝑠 𝑠 2 𝑑𝑠 =<𝑓𝑠𝑐> If average fsc would be 1 then the right side would be equal to 1. Essentially average fsc means that how much the effective resolution is reduced. We can calculate the effective resolution as: 𝑠 𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 = 3 <𝑓𝑠𝑐> 𝑠 𝑚𝑎𝑥 I.e. average fsc indicates how much the resolution of the data is reduced. Average fsc between map and model would mean what is the extend of common signal between map and model.
25
Effect of oversharpening
26
Map details, resolution and sharpening
The paper by Wlodawer, Li, Dauter (2017) High-Resolution Cryo-EM maps and models: A Crystallographer’s Perspective” Structure, 25, 1-9 hightlights the problem of lack of clarity in definition of resolution. The same problem exist in crystallography: if we go to very low signal to noise ratio then how much do we gain? Do we add anything useful to the resolvability of the peaks. Rob will
27
Map details, resolution and sharpening
Map details depend on the nominal resolution (i.e. sphere in Fourier space within which Fourier coefficients have been observed), mobility of the molecules (signal level) and noise in the map. In cryoEM reconstructions we can expect that noise level is more or less constant within the molecule. Nominal resolution is also constant for whole map. However signal level changes over the molecule (different mobility, occupancy). One side effect is that when the same sharpening is used over whole map noise may mask out signal where signal is relatively small.
28
<|F|> vs resolution
Map from EMDB Blurred with B value 40 and 60 Map is converted to mtz file. Various plots may help to see sharpening levels. At the end |F| is rising, clear indication of oversharpening Sharp fall of |F| at around 2.0. Clear indication of series termination Bartesaghi et al, Science, 2015
29
Model refined against blurred map
Map from PDB Blurred: B = 60 Model refined against blurred map Model refined using 2.6A data. 40 cycles simple restrained refinement.
30
Form factors for electron and X-ray scattering
31
Form factors for electrons
Electrons diffract from electrostatic potential. Charges and electrostatic potential are related with each other via Poisson equation in real space and Mott-Bethe formula in reciprocal space: Atoms in molecule have certain B values. They need to be accounted for. Using Moth-Bethe formula means that we can calculate contribution from nuclear and electron separately (two centre hydrogens can be accounted for) There are two problems that need to be solved: 1) Form factors for electrons; 2) Solvent Rho+ is density of nuclei and rho- is the electron density As s goes to 0 this formula may have some problems. If formfactors designed so that derivative of f_X(s) at 0 is equal to the inner potential of atoms then neutral atoms should be fine. Problem is even worse when atoms are charged. Form factors when s=0 is infinity and it also means that influence of charges is very long. In reality it should not be like this. There must be some screening by solvent.
32
Electron scattering factor
In principle using electron diffraction should allow refinement of charges, but data must be very accurate (we can use local neutrality restraints) Where o – overall occupancy, c – charge occupancy (charge occupancy refinement has not been implemented yet) Screening of charges might need to be accounted for. : For total Fourier coefficient (with total screening coefficient, I have implemented it but not tested): We need only one set of form factors – X-ray form factors. Electron form factors are derived from them and total Fourier coeffients can be calculated using this fact. Individual screening is hard to accout for. However total screening can be accounted for. Solvent modelling would be a bit different. Filtered maps will obscure form factors. In general refinement should be done against unfiltered maps and filters should be parameters of refinement and/or they should be used for visualisations only
33
Form factors Form factors have been tabulated for X-ray scattering. For tabulations isolated atoms are used. They do not account for bonding electrons effect. Peng et al converted X-ray form factors to electron form factors using Moth-Bethe formula – as sum of Gaussians. We need to update form factors. We need to calculate form factors for each atom (chemical) type. For example sp1, sp2, sp3 carbons should have different form factors. This result is very pleniminary
34
Some additional features of refmac
Calculate multiple sharpened and blurred maps Divide and conquer algorithm for large molecules Vector difference map (for ligand detection) Refinement against composite These might be usefud
35
Conclusions Cross-validation is a problem: refinement against half data maps might be useful Phases are important, however amplitudes do matter also Oversharpening can obscure features of the map. Multiple maps with different sharpening levels should be used for model building. Form factor for electron scattering must be improved There are many things that need to be implemented: accounting for noise variance, accurate form factors, hydrogen form factors, solvent contribution, map update. There will be more features. I hope ccp-em will change the way cryo-EM calculations are done.
36
Acknowledgements MRC-LMB CCP-EM Fei Long Tom Burnley
Rob Nicholls Martyn Winn Oleg Kovalevskiy Paul Emsley Michal Tykach Rangana Warshamanage Alan Brown Richard Henderson Venki Ramakrishnan Jan Löwe Many, many EM users and people of LMB People of CCP4 Users of our programs
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.