The P HENIX project Crystallographic software for automated structure determination Computational Crystallography Initiative (LBNL) -Paul Adams, Ralf Grosse-Kunstleve,

Slides:



Advertisements
Similar presentations
Kevin Cowtan, CCP4 March Pirate applications... Pirate:phase improvement software Brigantine:bias removal.
Advertisements

Continuous improvement of macromolecular crystal structures Tom Terwilliger (Los Alamos National Laboratory) DDD WG member ECM 2012: Diffraction Data Deposition.
Phasing Goal is to calculate phases using isomorphous and anomalous differences from PCMBS and GdCl3 derivatives --MIRAS. How many phasing triangles will.
Automated phase improvement and model building with Parrot and Buccaneer Kevin Cowtan
Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
Not just a pretty picture Jul. 14 Hyunwook Lee.
Bob Sweet Bill Furey Considerations in Collection of Anomalous Data.
Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
The TEXTAL System: Automated Model-Building Using Pattern Recognition Techniques Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University.
CAPRA: C-Alpha Pattern Recognition Algorithm Thomas R. Ioerger Department of Computer Science Texas A&M University.
Twinning in protein crystals NCI, Macromolecular Crystallography Laboratory, Synchrotron Radiation Research ANL Title Zbigniew Dauter.
The TEXTAL System for Automated Model Building Thomas R. Ioerger Texas A&M University.
Protein Primer. Outline n Protein representations n Structure of Proteins Structure of Proteins –Primary: amino acid sequence –Secondary:  -helices &
Current Status and Future Directions for TEXTAL March 2, 2003 The TEXTAL Group at Texas A&M: Thomas R. Ioerger James C. Sacchettini Tod Romo Kreshna Gopal.
TEXTAL - Automated Crystallographic Protein Structure Determination Using Pattern Recognition Principal Investigators: Thomas Ioerger (Dept. Computer Science)
Don't fffear the buccaneer Kevin Cowtan, York. ● Map simulation ⇨ A tool for building robust statistical methods ● 'Pirate' ⇨ A new statistical phase improvement.
Automated Model-Building with TEXTAL Thomas R. Ioerger Department of Computer Science Texas A&M University.
TEXTAL: A System for Automated Model Building Based on Pattern Recognition Thomas R. Ioerger Department of Computer Science Texas A&M University.
In honor of Professor B.C. Wang receiving the 2008 Patterson Award In honor of Professor B.C. Wang receiving the 2008 Patterson Award Direct Methods and.
Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak.
Chemometrics Method comparison
A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry.
MOLECULAR REPLACEMENT Basic approach Thoughtful approach Many many thanks to Airlie McCoy.
Not retired: Hall symbols + CIF or How Syd influenced my life without me noticing it. Ralf W. Grosse-Kunstleve Computational Crystallography Initiative.
Cab55342 Autobuild model Density-modified map Autobuilding starting with morphed model.
Kevin Cowtan, DevMeet CCP4 Wiki ccp4wiki.org Maintainer: YOU.
First Aid & Pathology Data quality assessment in PHENIX
BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York.
The ‘phase problem’ in X-ray crystallography What is ‘the problem’? How can we overcome ‘the problem’?
Using CCP4 for PX Martin Noble, Oxford University and CCP4.
Crystallographic Databases I590 Spring 2005 Based in part on slides from John C. Huffman.
Phasing Today’s goal is to calculate phases (  p ) for proteinase K using PCMBS and EuCl 3 (MIRAS method). What experimental data do we need? 1) from.
1. Diffraction intensity 2. Patterson map Lecture
X RAY CRYSTALLOGRAPHY. WHY X-RAY? IN ORDER TO BE OBSERVED THE DIMENTIONS OF AN OBJECT MUST BE HALF OF THE LIGHT WAVELENGHT USED TO OBSERVE IT.
POINTLESS & SCALA Phil Evans. POINTLESS What does it do? 1. Determination of Laue group & space group from unmerged data i. Finds highest symmetry lattice.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Siena Computational Crystallography School 2005
Methods in Chemistry III – Part 1 Modul M.Che.1101 WS 2010/11 – 8 Modern Methods of Inorganic Chemistry Mi 10:15-12:00, Hörsaal II George Sheldrick
EBI is an Outstation of the European Molecular Biology Laboratory. Sanchayita Sen, Ph.D. PDB Depositions Validation & Structure Quality.
Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Computational Crystallography InitiativePhysical Biosciences Division First Aid & Pathology Data quality assessment in PHENIX Peter Zwart.
Direct Use of Phase Information in Refmac Abingdon, University of Leiden P. Skubák.
Atomic structure model
X-ray crystallography – an overview (based on Bernie Brown’s talk, Dept. of Chemistry, WFU) Protein is crystallized (sometimes low-gravity atmosphere is.
Fitting EM maps into X-ray Data Alexei Vagin York Structural Biology Laboratory University of York.
Bethesda, March 4 th 2009 Semi-automatic structure solution with HKL-3000 Structural Biology.
Today: compute the experimental electron density map of proteinase K Fourier synthesis  (xyz)=  |F hkl | cos2  (hx+ky+lz -  hkl ) hkl.
Automated Refinement (distinct from manual building) Two TERMS: E total = E data ( w data ) + E stereochemistry E data describes the difference between.
Stony Brook Integrative Structural Biology Organization
Score maps improve clarity of density maps
The Crystal Screening Interface at ALS
Jean Ballet, CEA Saclay GSFC, 31 May 2006 All-sky source search
Complete automation in CCP4 What do we need and how to achieve it?
Phasing Today’s goal is to calculate phases (ap) for proteinase K using MIRAS method (PCMBS and GdCl3). What experimental data do we need? 1) from native.
Reduce the need for human intervention in protein model building
Introduction to Isomorphous Replacement and Anomalous Scattering Methods Measure native intensities Prepare isomorphous heavy atom derivatives Measure.
Experimental phasing in Crank2 Pavol Skubak and Navraj Pannu Biophysical Structural Chemistry, Leiden University, The Netherlands
Yeast RNA Polymerase II at 5 Å Resolution
Not your average density
Automated Molecular Replacement
Ligand Binding to the Voltage-Gated Kv1
Volume 21, Issue 11, Pages (November 2013)
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
Presentation transcript:

The P HENIX project Crystallographic software for automated structure determination Computational Crystallography Initiative (LBNL) -Paul Adams, Ralf Grosse-Kunstleve, Peter Zwart, Nigel Moriarty, Nicholas Sauter, Pavel Afonine Los Alamos National Lab (LANL) -Tom Terwilliger, Li-Wei Hung,Thiru Radhakannan Cambridge University -Randy Read, Airlie McCoy, Laurent Storoni, -Hamsapriye Texas A&M University -Tom Ioerger, Jim Sacchettini, Kreshna Gopal, Lalji Kanbi, -Erik McKee, Tod Romo, Reetal Pai, Kevin Childs, Vinod Reddy

Heavy-atom coordinates Non-crystallographic symmetry Electron-density map Atomic model Structure Determination by MAD/SAD/MIR in PHENIX Data files (H K L Fobs Sigma) Crystal information (Space group, Cell) Scattering factors (for MAD) Data are strong, accurate, < 3 Å Strong anomalous signal Little decay Space group is correct Scattering factors close (for MAD data) You are willing to wait a little while… (10 minutes to hours, depending on size) Model is 50-95% complete (depending on resolution) Model is (mostly) compatible with the data…but is not completely correct Model requires manual rebuilding Model requires validation and error analysis PROVIDED THAT :

Molecular Replacement Use of distant models Preventing model bias Major needs in automated structure solution MAD/SAD/MIR Robust structure determination procedures Best possible electron density maps to build most complete model Decision-making about best path for structure solution All structures Model completion/Ligand fitting Error analysis Decision-making for what data to use and what path to follow How to incorporate vast experience of crystallographic community

Automation of MAD/SAD Structure Determination in PHENIX Robust structure determination procedures Decision-making about best path for structure solution Measures of the quality of electron-density maps PHENIX AutoSol/AutoBuild/AutoMR/AutoLig structure solution wizards Best possible electron density maps to build most complete model Phase information for density modification from local patterns of density, ID of fragments, iterative model-building and refinement, FULL-OMIT density modification and model-building

Why we need good measures of the quality of an electron- density map: Which solution is best? Are we on the right track? If map is good: It is easy

BasisGood mapRandom map Skew of density (Podjarny, 1977) Highly skewed (very positive at positions of atoms, zero elsewhere) Gaussian histogram SD of local rms densities (Terwilliger, 1999) Solvent and protein regions have very different rms densities Map is uniformly noisy Connectivity of regions of high density (Baker, Krukowski, & Agard, 1993) A few connected regions can trace entire molecule Many very short connected regions Presence of tubes of density or helices/strands or local patterns in map (Colovos, Toth & Yeates, 2001; Terwilliger, 2004) CC of map with a filtered version is high CC with filtered version is low Evaluating electron density maps: Methods examining the map itself

BasisGood mapRandom map R-factor in 1 st cycle of density modification* (Cowtan, 1996) Low R-factorHigh R-factor Correlation of map made with map-probability phases with original map (Terwilliger, 2001 ) (map-probability from solvent flattening or from truncation at high density level) High correlationLower correlation Evaluating electron density maps Methods based on density-modification and R-factors

IF5A (P. aerophilum, 60% solvent; randomized maps) Skew of electron density in maps of varying quality (High electron density at positions of atoms; near zero everywhere else => high skew for good map)

PHENIX wizards Automation of steps in structure determination Read “Facts” (“what is state of the world now?”) Get user inputs (if needed) List and score all options Carry out best option No more options: write Facts and quit

PHENIX AutoSol wizard standard sequence ( but you can step through and make any choices you like!) Scale and analyze X-ray data (SAD/MAD/MIR/Multiple datasets) HYSS heavy-atom search on each dataset (following methods of Weeks, Sheldrick and others) Phasing and density modification Improve sites with difference Fourier Build preliminary model Score and rank solutions (Which hand? Which dataset gave best solution?)

Statistical density modification (A framework that separates map information from experimental information and builds on density modification procedures developed by Wang, Bricogne and others) Principle: phase probability information from probability of the map and from experiment: P(  )= P map probability (  ) P experiment (  ) “ Phases that lead to a believable map are more probable than those that do not” A believable map is a map that has… a relatively flat solvent region NCS (if appropriate) A distribution of densities like those of model proteins Calculating map probability (  ) : calculate how map probability varies with electron density  Use chain rule to deduce how map probability varies with phase (equations of Bricogne, 1992).

Map probability phasing: Getting a new probability distribution for a single phase given estimates of all others 1.Identify expected features of map (flat far from center) 2.Calculate map with current estimates of all structure factors except one (k) 3. Test all possible phases  for structure factor k (for each phase, calculate new map including k) 4. Probability of phase  estimated from agreement of maps with expectations A function that is (relatively) flat far from the origin Function calculated from estimates of all structure factors but one (k) Test each possible phase of structure factor k. P(  ) is high for phase that leads to flat region

A map-probability function A map with a flat (blank) solvent region is a likely map Log-probability of the map is sum over all points in map of local log-probability Local log-probability is believability of the value of electron density (  (x)) found at this point If the point is in the PROTEIN region, most values of electron density (  (x)) are believable If the point is in the SOLVENT region, only values of electron density near zero are believable

Statistical density modification features and applications Features: Can make use of any expectations about the map. A separate probability distribution for electron density can be calculated for every point in the map Applications: Solvent flattening Non-crystallographic symmetry averaging Template matching Partial model phasing Prime-and-switch phasing General phase recovery Iterative model-building Reference: Terwilliger, T. C. (2000). Maximum- likelihood density modification. Acta Crystallographica, D55, SOLVE map (52 Se) RESOLVE map ( data courtesy of Ward Smith & Cheryl Janson )

PHENIX AutoBuild wizard standard sequence (Following ideas from Lamzin & Perrakis) Fp, phases, HL coefficients Density modify (with NCS, density histograms, solvent flattening, fragment ID, local pattern ID) Density modify including model information Evaluate final model Build and score models Refine with phenix.refine

SAD data at 2.6 A gene 5 protein SOLVE SAD map

SAD data at 2.6 A gene 5 protein Density-modified SAD map

SAD data at 2.6 A gene 5 protein Cycle 50 of iterative model-building, density modification and refinement

SAD data at 2.6 A gene 5 protein Cycle 50 of iterative model-building, density modification and refinement (with model built from this map)

Why iterative model building, density modification, and refinement can improve a map (following ideas of Perrakis & Lamzin): 1. New information is introduced: flat solvent, density distributions, stereochemically reasonable geometry and atomic shapes 2. Model rebuilding removes correlations of errors in atomic positions introduced by refinement 3. Improvement of density in one part of map improves density everywhere. Iterative model-building map Density-modified map

Iterative model-building and refinement is very powerful but isn’t perfect… Model-based information is introduced in exactly the same place that we will want to look for details of electron density  How can we be sure that the density is not biased due to our model information? (Will density be higher just because we put an atom there?) (Will solvent region be flatter than it really is because we flattened it?) (Will we underestimate errors in electron density from a density-modified map?) (Are we losing some types of information by requiring the map to match partially incorrect prior knowledge?) Iterative model-building map Density-modified map

A FULL-OMIT iterative-model-building map: everywhere improved, everywhere unbiased  Use prior knowledge about one part of a map to improve density in another Related methods: “Omit map”, “SA-composite omit map”, density-modification OMIT methods, “Ping- pong refinement” Principal new feature: The benefits of iterative model-building are obtained yet the entire map is unbiased Requires: Statistical density modification so that separate probability distributions can be specified for omit regions (allow anything) and modified regions (apply prior knowledge) Outside OMIT region – full density modification OMIT region – no density modification (no solvent flattening, no model, no histograms, no NCS)

Including all regions in density modification comparison with FULL-OMIT FULL-OMITAll included Density-modified Iterative model-building

FULL-OMIT iterative-model-building maps Molecular Replacement: (Mtb superoxide dismutase, 1IDS, Cooper et al, 1994)

FULL-OMIT iterative-model-building maps Uses: Unbiased high-quality electron density from experimental phases High-quality molecular replacement maps with no model bias Model evaluation Computation required: ~24 x the computation for standard iterative model-building

FULL-OMIT iterative-model-building maps Requirement for preventing bias: Density information must have no long-range correlated errors (the position of one atom must not have been adjusted to compensate for errors in another)  Starting model (if MR) must be unrefined in this cell

Automated fitting of flexible ligands A difference electron- density map View 2 …and a flexible ligand to be fitted 2-Aryl-2,2-Difluoroacetamide FKBP12

Automated fitting of flexible ligands….finding location of a core fragment … and identify their best locations Choose largest rigid fragments of ligand

Automated fitting of flexible ligands….extending the core fragment Take each location… … and extend the model into the density

Automated fitting of flexible ligands….comparison with refined FBK12 model in PDB entry 1J4R Yellow: auto- built ligand Green: 1J4R

Examples of fitting difference density using data from PDB Fitting a ligand takes 2-3 minutes Yellow: auto-built ligand Green: from PDB FMN (1AG9) NADPH (1C3V)  -TAM (1AER) Pyridoxal phosphate (1A8I)

Automated structure determination at moderate resolution Scaling Heavy-atom substructure determination Molecular Replacement Model-building Ligand-fitting Refinement NCS identification Statistical density modification Iterative model-building The P HENIX project

Molecular Replacement Use of distant models Preventing model bias Major needs in automated structure solution MAD/SAD/MIR Robust structure determination procedures Best possible electron density maps to build most complete model All structures Model completion/Ligand fitting Error analysis Decision-making for what data to use and what path to follow How to incorporate vast experience of crystallographic community

Acknowledgements PHENIX: Computational Crystallography Initiative (LBNL): Paul Adams, Ralf Grosse-Kunstleve, Nigel Moriarty, Nick Sauter, Pavel Afonine, Peter Zwart Randy Read, Airlie McCoy, Laurent Storoni, Hamsaprie (Cambridge) Tom Ioerger, Jim Sacchettini, Kresna Gopal, Lalji Kanbi, Erik McKee Tod Romo, Reetal Pai, Kevin Childs, Vinod Reddy (Texas A&M) Li-wei Hung,Thiru Radhakannan (Los Alamos) Generous support for PHENIX from the NIGMS Protein Structure Initiative

PHENIX web site: SOLVE/RESOLVE web site: SOLVE/RESOLVE user’s group: