X-ray crystallography – an overview (based on Bernie Brown’s talk, Dept. of Chemistry, WFU) Protein is crystallized (sometimes low-gravity atmosphere is helpful e.g. NASA) X-Rays are scattered by electrons in molecule Diffraction produces a pattern of spots on a film that must be mathematically deconstructed Result is electron density (contour map) – need to know protein sequence and match it to density Hydrogen atoms not typically visible (except at very high resolution)
X-ray Crystallography – in a nutshell REFLECTIONS h k l I σ(I) ? Phase Problem ? MIR MAD MR Electron density: (x y z) = 1/V |F(h k l)| exp[–2 i (hx + hy + lz) + i (h k l)] Bragg’s law Fourier transform
Crystal formation Start with supersaturated solution of protein Slowly eliminate water from the protein Add molecules that compete with the protein for water (3 types: salts, organic solvents, PEGs) Trial and error Most crystals ~50% solvent Crystals may be very fragile
Visible light vs. X-rays Why don’t we just use a microscope to look at proteins? Size of objects imaged limited by wavelength. Resolution ~ /2 –Visible light – Å ( nm) –X-rays – Å ( nm) It is very difficult to focus X-rays (Fresnel lenses) Getting around the problem –Defined beam –Regular structure of object (crystal) Result – diffraction pattern (not a focused image).
Diffraction pattern – lots of spots X-ray beam crystal Film/Image plate/CCD camera ~10 15 molecules/crystal Diffraction pattern is amplified Bragg’s Law: 2d sin = n
End result – really! Fourier transform of diffraction spots electron density fit a.a. sequence Protein DNA pieces (Dimer of dimers)
Interference of waves In crystallography, get intensity information only, not phase information Need to deconvolute and obtain phase information: THE PHASE PROBLEM
How to get from spots to structure? Fourier synthesis Getting around phase problem –Trial and error –Previous structures –Heavy atom replacement – make a landmark –Ex: Selenomethionine Plenty of computer algorithms now
Electron density with incorrect phases Red is true structure
The effect of resolution More extensive diffraction pattern gives more structural information = higher resolution Å – secondary structure elements 3.0 Å – trace polypeptide chain 2.0 Å – side chain, bound water identification 1.8 Å – alternate side chain orientations 1.2 Å – hydrogen atoms
With computational tools, spots become density Flexible regions give smeared density, often 2-3 conformations visible, more than that invisible
Density becomes structure Need to know protein sequence to trace backbone
Co-crystal structures Because of relatively high solvent content, can often “soak in” substrate Then can solve structure of protein with substrate bound If crystal cracks, good sign that substrate binding or enzyme catalysis results in conformational change in protein No longer has same crystal arrangement
NMR vs. crystallography Useful for different samples Generally good agreement E. coli thioredoxin: X-ray NMR Note missing region
Known protein structures ~17,000 protein structures since 1958 Common depository of x,y,z coordinates: Protein data bank ( Coordinates can be extracted and viewed Comparisons of structures allows identification of structural motifs Proteins with similar functions and sequences = homologs
Growth in structure determination
Might identify a pocket lined with negatively-charged residues Or positively charged surface – possibly for binding a negatively charged nucleic acid Rossmann fold – binds nucleotides Zinc finger – may bind DNA Function from structure
Domain organization Large proteins have polypeptide regions that fold in isolation May have distinct functional roles –Example: glyceraldehyde-3- phosphate dehydrogenase
Protein families Similar function and overall structure But amino acid sequence may or may not be highly conserved Limited number of protein domains Homologs versus structural motifs
SCOP Classification Statistics ClassFolds SuperfamiliesFamilies All All Alpha & beta ( ) Alpha & beta ( ) Multi-domain proteins39 50 Membrane /cell-surface proteins Small proteins Total Structural Classification of Proteins PDB Entries, Domains (1 March 2002) (excluding nucleic acids and theoretical models) or
Have all folds been found? Red = Old folds Blue = New folds