Presentation is loading. Please wait.

Presentation is loading. Please wait.

Electron Density Structure factor amplitude defined as: F unit cell (S) = ∫ r  (r) · exp (2  i r · S) dr Using the inverse Fourier Transform  (r) =

Similar presentations


Presentation on theme: "Electron Density Structure factor amplitude defined as: F unit cell (S) = ∫ r  (r) · exp (2  i r · S) dr Using the inverse Fourier Transform  (r) ="— Presentation transcript:

1 Electron Density Structure factor amplitude defined as: F unit cell (S) = ∫ r  (r) · exp (2  i r · S) dr Using the inverse Fourier Transform  (r) = ∫ r F(S) · exp (-2  i r · S) dS In practice you make a discrete inverse Fourier Transform:  (r) =  hkl F hkl · exp (-2  i  hkl ) But measure the X-ray diffraction intensities I hkl ∝ |F hlk | 2 The fact that you cannot measure  hkl directly is called THE PHASE PROBLEM

2 Solutions to the Phase Problem Direct methods: - Based upon systematic relations between certain reflections. - Need high resolution data & relatively small systems. - Overwhelmingly most popular for small molecule structures. Molecular replacement: - Find a molecule of known structure which is close enough to your protein of interest to provide a good first guess. - Becoming more popular as the spectrum of possible structures is filled up. Heavy atom methods: - Soak an atom which is a strong scatter (eg. Hg, Fe, Pb, I,Se..) into your crystal. - Replace the methionine’s in your protein with selenio- methionine derivatives. - Use Multiple Isomorphous replacement. - Use Multiple or Single Anomolous Diffraction. - An old & powerful method for finding phases.

3 Isomorphous Replacement Each atom in the unit cell contributes to every diffraction spot. The greatest contributions are to reflections whose indices correspond to lattice planes that intersect that atom. Specific atoms contribute to some reflections strongly, to some weakly, and not at all to other reflections. If very small number of atoms are added to identical sites in all unit cells of the crystal, there would be a change in the diffraction pattern. In order for these changes to be large enough to measure, the added atom must be a strong diffractor (i.e. have many electrons). The slight perturbations in the diffraction pattern caused by the addition of these added atoms can be used to obtain initial estimates of phases. Need the atoms to be electron rich for detection purposes.

4 Heavy atom methods Must search around for a heavy atom which binds within your crystal & doesn’t destroy the crystal lattice. - Can be extremely frustrating! - Many soaking, freezing & diffraction experiments. - The heavy atom must bind in an ordered way to the protein. Suppose you have a protein with structure factor F P (P stands for protein). For every X-ray intensity measured the addition of the heavy atom adds a term F H to the scattering (H for heavy atom): FPFP F PH FHFH F PH = F P + F H This is for a single reflection hkl.

5 Heavy atom methods Assuming that F H has an angle  H, the structure factor amplitude will be perturbed by  F PH = | F PH | – | F P | = | F H | cos (  P -  H ) FPFP F PH FHFH  PH -  H  P -  H We can compute F H if we can locate the position of the heavy atom. Although this example shows the relationship for a specific hkl, the same is true for all reflections.

6 Heavy atom methods If you can find the location of you heavy atom r H, then you can calculate the heavy atom structure factor F H = f H exp 2  i r H · S hkl just like any other atom within the protein. If you have already measured F H and F PH you can recover a constraint on the phases for the protein: FPFP F PH FHFH Can determine |F PH | and |F P | directly from the diffraction data. Can determine F H, including its phase angle by locating the heavy atoms.

7 FPFP F PH FHFH Additional constraints A single heavy atoms derivative gives you a two complex phases which may be correct for every reflection (h,k,l). For the approach to work, native and derivative crystals must be isomorphous (ie. same a, b, c, , ,  & space group). In order to determine which phase is correct you need to find additional derivatives. Called Multiple Isomorphous Replacement

8 Additional constraints If you have a second constraint: -Better to draw diagram with F P centred at origin rather than F H & F PH. - This then represents F P = F PH – F H Should recover one place where three circles intersect. This is your solution for the phases. FPFP Possible solution for light green & light blue measurements. Possible solution for dark green & dark blue measurements.

9 Multiple Isomorphous Replacement In order to resolve the phase ambiguity from the first heavy atom derivative the second heavy atom must bind at a different site. If the two atoms bind at the same site, the phases will be the same for both of them. This is because the phase of an atomic structure factor depends only on the location of the atom and not its identity. In reality, sometimes need to have three or more heavy atom derivatives (all of which must be isomorphous to the native) to produce a good initial phase estimate. To compute the atomic structure, ultimately must know the phases for all reflections. Computational algorithms can be used to solve these vector problems. For many phases, the precision of the first phase estimate may be very low. For example, if the circles do not intersect sharply, have uncertainty.

10 Combining phase information In practice there are errors with respect to each heavy atom experiment. - Therefore you recover a probability distribution for a phase, rather than an absolute phase. - Best to take a weighted sum of the probabilities to determine the experimental phases. m is the length of the probability weighted experimental phase & is called the figure of merit. Ideally m =1, but in practice m < 1.  hkl P  hkl ) m  hkl P  hkl ) m

11 Locating heavy atoms in the unit cell Before we can obtain phase estimates, we must determine the position of the heavy atoms in the unit cell. This can be done by “extracting” the simple diffraction signature of the heavy atom from the complicated diffraction pattern of the protein. This is done by measuring the diffraction of the protein by itself and then measuring the diffraction from the heavy atom derivative. Subtracting the two sets of diffraction data will give use the simple diffraction pattern of just the heavy atom (which bind to the unit cell in a few, small number of places). The most powerful way to determine heavy atom coordinates is to use the Patterson function. The Patterson function can be thought of as a Fourier series without any terms for the phases.

12 u The Patterson Function The previous treatment relied on finding the heavy atom positions. In practice this is usually solvable using the Patterson Function. P(u,v,w) =  hkl | F hkl | 2 exp 2  i u·S In this case you can calculate it without any phase information, but directly from the measured intensities. Note that | F hkl | 2 = F hlk x F hkl * hence P(u,v,w) =  (r) x  (r) = ∫  (r) x  † (r + u) dr where we use the convolution theorem for Fourier Transforms.  † (r + u) represents the inverse of the real electron density. eg. If you had two heavy atoms in a unit cell the Patterson function would look like:

13 Patterson Function Because the Patterson function contains no phases, it can be computed from any raw set of crystallographic data. A contour map of the Patterson function displays peaks corresponding to vectors between atoms. Since the number of vectors between atoms is greater than the number of atoms,a Patterson map for a protein will look very complicated. However, if you are concerned with a very simple structure (for example, the positions of the heavy atoms in a unit cell), the Patterson map will be very simple. Most importantly, the simplicity of the Patterson map will allow us to determine the positions of the few heavy atoms in the unit cell. The reason for calculating a Patterson function for just the heavy atoms is to avoid dealing with the large number of vectors between all the atoms in the protein.

14 u Patterson Function Since P(u,v,w) =  (r) x  (r) = ∫  (r) x  (r+u) dr This means that the Patterson Map gives you: A central peak for u = 0 since the density sits upon itself for all atoms (including the protein). An additional peak whenever one heavy atom sits upon another. - It rapidly becomes very complicated but can be solved (used to be done by inspection) if you have a small number of heavy atoms.

15 Difference Patterson Function Since you are looking for the scattering from the heavy atoms & the protein adds only background its more convenient to calculate the difference Patterson: P(u,v,w) =  hkl ( | F PH | 2 - | F P | 2 ) exp 2  i u·S This difference Patterson map gives you the vectors between the heavy atoms directly. Somewhat easier to interpret than the Patterson for F PH itself since it has less background from protein-protein distances. It is easily demonstrated that for a unit cell containing n atoms, the number of peaks in the Patterson unit cell will be n (n-1). The (n-1) term comes from the fact that we don’t have to calculate the vactors between each atom to itself.

16 Solving the Difference Patterson If you are successful with recovering a derivative & recover a Patterson function then you must solve it to proceed. - The Patterson function gives you a set of constraints in 3D which are the distances between heavy atoms. There exist algorithms for finding unique solutions which work for a reasonable number of heavy atoms - usually ten or less. - The number of non-origin peaks is N(N-1) for N heavy atoms. ⇒ invert Simple case of 3 atoms can be done by trial and error. More complicated cases done computationally.

17 Magnitude of intensity changes with heavy atoms May ask how one (or a few) heavy atoms within a protein of eg. 50 kDa could be seen? - eg. Hg has 80 e - - The protein is a sea of eg. 4,000 non-carbon atoms with f average ≈ 7 e - (ie 28,000 e - in total). - How can you see 80 e - /28000 e - = 0.3 % ? On average: F protein ≈ f average x  N) 1/2 (N is the number of non-hydrogen atom random walk result) hence I protein  | F atom | 2 ≈ (f average ) 2 x N But for the heavy atom I H  F H 2 Now |F protein | + | F H | ≈ |f average | x  N) 1/2 + |F H | Hence I PH ≈ ( f average x  N) 1/2 + F H ) 2

18 ≈ ( f average x  N) 1/2 + F H ) x ( f average x  N) 1/2 + F H ) ≈ (f average x  N) 1/2 ) 2 + 2 (f average x  N) 1/2 x F H + F H 2 Therefore  I PH ≈ (f average x  N) 1/2 ) 2 + 2 (f average x  N) 1/2 ) x F H + F H 2 - (f average x  N) 1/2 ) 2 = 2 (f average x  N) 1/2 ) x F H + F H 2 The first term can be evaluate for eg. 4000 non-hydrogen atoms: 2 (f average x  N) 1/2 ) x F H = 2 x 7 x (4000) 1/2 x 80 = 70,835 The second term F H 2 = 80 2 = 6,400 And I protein  | F atom | 2 ≈ (f average ) 2 x N = 196,000 Hence in this case I PH /I protein ≈ 71/196 = 36 %

19 Protein weight100 % occupancy50 % occupancy 14,00051 %25 % 28,00036 %18 % 56,00025 %12 % 112,00018 %9 % 224,00013 %6 % 448,0009 %4 % Average intensity change if one heavy atom is bound to a protein In practice changes of 14 % is a good phasing measurement.

20 Lack of Closure In practice when you have found the phases experimentally there is some mis-match: The mis-match is called the lack of closure & is given the symbol . FPFP F PH FHFH Ideal case FPFP F PH FHFH  Reality   |F H (h) + F P (h)| - |F PH (h)|

21 Phasing power From the mis-match you can estimate the phasing power Phasing Power = √(  |F H | 2 /   2 ) A phasing power of 4 is excellent & is rare: A value between 1 & 2 is acceptable & means that the scattering of the heavy-atom is larger than the lack of closure. FPFP F PH FHFH 

22 First Map. Once you have recovered the experimental phases you can make a Fourier transform. At that point if the electron density is good enough you can build a structural model into the density. If this goes well you can then make interative rounds of structural refinement, phase improvement, and more model building until you recover a satisfactory model.

23 Anomalous scattering A second means of obtaining phases from heavy atom derivatives. Takes advantage of the capacity of the heavy atom to absorb X rays of specific wavelengths. This is called anomalous scattering. Elements can absorb X-rays as well as emit them, and this absorption falls off sharply at wavelengths just below their characteristic emission wavelength. This sudden change in absorption as a function of wavelength is called an absorption edge. The consequence of this absorption is that Friedel’s law does not hold and reflections hkl and -h-k-l are no longer equal. Absoprtion edges for atoms like C, N, O are not near the wavelength of X-rays used in crystallography. Absorption edges of heavy atoms are in this range. If wavelength can be changed (tuned), possible to collect data in range that maximize anomalous scattering.

24 Anomalous scattering An assumption to date was that X-rays scattering from an atom, f j, was a real number: - In physics this is equivalent to assuming that all electrons can be treated as scattering from free electrons & therefore contribute a phase change of  for the scattered X-ray. When you go close to the atomic energy levels of certain atoms then you can no longer assume that this holds. X-rays scatter with an anomalous scattering term near an absorption edge. A transition from K to L shell electrons. A photoelectron ejected from the K shell.

25 Atomic absorption coeffecient For example copper has a K-absorption edge at 1.38 Å due to the photoelectric effect. There are also transitions from K to L at 1.43 Å. If you have Cu in your protein & you record near 1.43 Å you will have an anomalous scattering from this atom. 1.02.0 X-ray wavelength (Å) X-ray absorption

26 Anomalous scattering To describe the anomalous scattering (which is wavelength dependent) we modify the atomic scattering factor for the particular heavy atom: f anom = f +  f + if'' = f + f' + f'' Two different symbols are used for the second term depending on what book you look at. ff' f'' f anom

27 Consequence for Friedel pairs Earlier we showed that the reflection (h,k,l) & its opposite (-h,-k,-l) have the same intensity since F -h-k-l =  j f j (S hkl ) · exp (2  i r j · S -h-k-l ) = (F hkl )* & therefore I hkl = I -h-k-l - These are called Friedel pairs & have the same intensity: When you have anomalous scattering this no longer holds: F -h-k-l F hkl F PH (-h-k-l) FHFH F* H No anomalous scattering F -h-k-l F hkl FHFH F* H With anomalous scattering F PH (hkl) F PH (-h-k-l) F PH (hkl)

28 When you have anomalous scattering this no longer holds: Frequently draw the picture with the F PH (-h-k-l) reflected in the real axis. When scaling data it is possible to measure these differences due to the anomalous signal. F hkl FHFH F PH (-h-k-l) F PH (hkl) f’ f’’ -f’’ ff' f'' f anom |F PH (hkl)| – |F PH (-h-k-l)|

29 |F| ano Patterson map If we make a measurement and are careful to measure all the Friedel pairs we can define:  |F ano | = {|F PH (hkl)| – |F PH (-h-k-l)|} × f’/2f’’ where the scaling factor f’/2f’’ is put in for technical normalisation reasons. If we then calculate the Patterson map using P ano (u) =  hkl  |F ano | 2 exp 2pi u·S This gives us the inter-distance constraints for the anomolous scatterers. - This is powerful since you only need one set of observations & not two, so the noise level is low even if  |F ano | is small.

30 Multiple Anomalous Diffraction It is possible to accurately tune synchrotron radiation. - It is therefore possible to collect diffraction data from the same crystal at different wavelengths. Near an X-ray absorption edge the real & imaginary components f’ & f’’ change rapidly: 12.4 12.5 12.6 12.7 12.8 X-ray Energy (keV) f’’ (electrons) X-ray Energy (keV) 12.4 12.5 12.6 12.7 12.8 f’ (electrons) 5 1 -10 0

31 Multiple Anomalous Diffraction As such it is possible to record three diffraction data sets from a single crystal: - At the peak where f’ has its maximum. - At the peak where f’’ has its minimum (ie. Most negative). - Far removed from the above two wavelengths. This is called Multiple Anomalous diffraction (MAD) since you use three wavelengths to get your data. - You can solve the structure from a single crystal (ie. don’t need additional derivatives or another native data set). - No problems with crystal being isomorphous. A detailed description of the algebra of solving structures using MAD can be found in most textbooks (for example Principles of protein X- ray crystallography (2nd edition). By Jan Drenth), but for this course you can assume its (more or less) the same as having one native plus two derivative data sets.

32 Multiple Anomalous Diffraction The application of this technique for crystallographic phase determination has been aided by 3 developments: 1. Variable wavelength sychrotron X-ray sources. 2. Cryo-crystallography 3. Production of proteins containing selenomethionine For proteins that naturally contain a heavy atom (for example, iron in hemoglobin), the native heavy atom provides the source of anomalous scattering. Isomorphism is not a problem because the same crystal serves as both the “native” and the “derivative” source. The real and the imaginary anomalous scattering factors vary greatly with the wavelength of the X-ray. Thus, data collected at heavy atom absorption edge, the edge, and at a wavelength far away from the edge have distinct values for the real and imaginary contributions to anomalous scattering.

33 Multiple Anomalous Diffraction Each measurement of a Friedel pair at a specific wavelength provides the components of distinct set of phasing equations. In addition to the wavelength-dependent anomalous differences, individual reflection intensities can also vary with wavelength. These differences are called dispersive differences and also contain phase information. The phase information for dispersive differences can be extracted by solving equations similar to those for isomorphous replacement. To maximize the accuracy of the measured differences between Friedel Pairs, corresponding pairs should be measured at the same time so that crystal decay does not affect that measurements. This technical demand has been met by the advent of cryocrystallography. Flash cooling the crystals at liquid nitrogen (sometimes liquid helium) temperatures preserves the crystal throughout the course of the extensive data collection time scale.

34 Multiple Anomalous Diffraction The first successful application of this method were for the determination of structures of small proteins containing functional heavy atoms. But larger proteins contain many methionines and thus too many heavy atoms. Typically, heavy atoms positions can be located using Patterson methods. The Patterson method is a trial and error method. Thus, Patterson maps of proteins with many heavy atoms may require so many trial solutions that they may be impossible to solve by hand. More recently, protein crystallographers have begun to use algorithms that were designed for determining the structures of small molecules by Direct methods as a tool for finding the positions of heavy atoms in cases where trial and error methods have failed. The position of all of the heavy atoms in the unit cell is called the heavy atom substructure.

35 Direct methods For small molecules (up to 200 atoms or so), phases can be determined by what are commonly called direct methods. One form of direct methods relies on the existence of mathematical relationships among certain combination of phases. From these relationships, a sufficient number of initial phase estimates can be obtained to begin converging toward a complete set of phases. Another form, used in the program Shake-and-Bake, tries out random arrangements of atoms, simulates the diffraction patterns they would produce, and compares the simulated pattern with the diffraction pattern from the crystal. Direct methods are best for small molecules while MIR/MAD, etc. are best for large large macromolecules whose structures will not be disturbed by the introduction of a heavy atom. More recently, direct phasing has been used to determine substructures which can then be employed in MIR/MAD, etc. phasing methods. Example of a 250,000 dalton protein with 65 methionines.

36 Direct methods If you have a small number of atoms & very good resolution you may recover a structure from a native data set. The concept is that there are phase-relations between different Bragg reflections:  (h 1 ) +  (h 2 ) +  (-h 1 – h 2 ) = 0 Geometrical arguments can be used to show that this holds when atoms sit on the lattice planes but not in-between lattice planes (chapter 11 of the textbook). - In practice this assumption holds approximately for very strong reflections, which are strong since most of the atoms are scattering in phase & therfore on the same lattice plane. - This assumption itself breaks down for large proteins.

37 Shake & Bake Direct methods is very successful for small molecule crystallography. - In practice you pick a few phases & derive the rest from the triplet relations for a limited number of strong reflections. - For molecules with > 150 non-hydrogen atoms the unit cell is so evenly filled with atoms that the phase triplet relation doesn’t work. An algorithm has been written to extend up to about 1000 non-hydrogen atoms but requires about 1.2 Å data. - The principle is that the phase triplet is no longer set to zero but obeys a probability distribution. - You then shake the phase angles in reciprocal space & bake out the low density regions in real space.

38 SAD & SIR In addition to MIR & MAD there is Single Isomorphous Replacement (SIR) or Single Anomalous Diffraction (SAD). - The same as MIR or MAD but with one data set rather than several. In SIR if your data is good you assume that your phases from one heavy atom are the sum of the two possible phases with 50 % probability. - Then go ahead & calculate a map. - The wrong choice of phases adds noise. - Relies on the data being good enough to overcome additional noise. SAD is like SIR but with the variation that you also use the anomalous signal to give additonal information. FPFP F PH FHFH

39 Molecular Replacement If you believe that your protein may have a structure similar to that of another protein of known structure you can use that information. - Solving the structure by making a good guess! The problem is to place the possible structure correctly within the unit cell & then you calculate a first electron density map using phases calculated from the know structure. ? Also useful if you have sub- domains of known structure.

40 The Rotation Function Once you choose a molecular replacement model you first need to determine its orientation. - Done by comparing the experimental Patterson Map with one calculated from your candidate structure. - The Patterson map is sensitive to the orientation of the molecule but not its position within the unit cell. Rotation

41 Cross Rotation Function An overlap function is defined as R of the experimental P(u) with the rotated version of the candidate model Patterson P’ r (u r ) is defined as: R(  ) = ∫ P(u) x P’ r (u r ) d(u) Where  are normally the Euler angles describing rotation. R(  ) is maximal when the rotational angle is correct & is not affected by where in the unit cell the candidate structural model is placed. Poor correlation because the overlap is not excellent. If rotated can recover perfect overlap ie. R(  ) is a maximum.

42 Translation function Once you have found the optimal rotational orientation next need to find the correct translation. - In this case the Patterson function is useless since its insensitive to translation. What one does is move the molecule around in the unit cell & calculate the theoretical F hkl calc values & compare these with the experimental F hkl obs. ?

43 R-factor & Correlation Coeffecient Two numbers are optimised: The R-factor R =  hkl ||F obs | - k|F calc ||/ (  hkl |F obs |) which should be minimised -ie. calculated structure factors are as close as possible to the experimental structure factors. The Standard Linear Correlation Coeffecient C =  hkl (|F obs | 2 - ) × (|F calc | 2 - ) × {  hkl (|F obs | 2 - ) 2 ×  hkl (|F calc | 2 - ) 2 } -1/2 which should be maximised. - ie. When F obs is much greater than the average, F calc should also be greater than the average etc. Useful values are C > 30 % & R< 55 %.

44 Protein Data Bank Entries This month the number of entries in the protein data bank will surpass 30,000. Current deposit rate is approaching 6,000 per year or 20 per day. - Not all are unique! X-ray crystallography is responsible for approx 75 % - Phasing methods have been very successful!


Download ppt "Electron Density Structure factor amplitude defined as: F unit cell (S) = ∫ r  (r) · exp (2  i r · S) dr Using the inverse Fourier Transform  (r) ="

Similar presentations


Ads by Google