Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Structure Determination --- given a data set, solve the structure quickly and better, by using a parallel workflow engine to automatically and.

Similar presentations


Presentation on theme: "Automatic Structure Determination --- given a data set, solve the structure quickly and better, by using a parallel workflow engine to automatically and."— Presentation transcript:

1 Automatic Structure Determination --- given a data set, solve the structure quickly and better, by using a parallel workflow engine to automatically and systematically search algorithm/program and parameter space n Zheng-Qing (Albert) Fu n SER-CAT, APS, Argonne National Laboratory n Biochem. & Mol. Biology, Univ. Of Georgia, Athens, Georgia 2007 ACA Summer School

2 What we learnt from Structural Genomics Cloned (7%) Crystals (33%) Structures Overall Success Rate 2.45% Cloned (7%) Crystals (33%) Structures Overall Success Rate (from Clone to Structure): 2.45% All Targets ClonedCrystalsStructures

3 Gene Crystallization Phasing Key to Success From gene to final structure, crystallographic analysis of protein structures is a complicated Multi-Step, Multi-Discipline, Costly, and Systematic Engineering Project. Data Collection, Data Procession and Structure Solving Process (Intensive Computing) Structure Protein Prep Bottle Neck Fu (2002): Diffraction Methods In Structural Biology, Gordon Research Conferences. New London, CT, USA. Refinement Map Tracing Data Collection Data Processing Tedious & Time Consuming

4 Why Automation? Reason #1 Automation may optimize the steps of the whole process, and thus improve the success rate and accuracy of the final structure.

5 Why Automation? Reason #2: The Structural Biology in the post-genomics era challenges the X-ray crystallography to provide better hardware, better software and better full services. >> Every Structural Biologist was also an Excellent Crystallographer >> Most of the new-generation Structural Biologists only know, if any at all, some basic concepts of Crystallography. They depends on other people’s recipes, and at most learn how to run a bunch of computer programs. Do they want to, or have ability to solve new problems related to Crystallography?

6 Why Automation? Blood Coagulation Inhibitor: A small protein containing 12 Cys. Source: venom of habu (rattlesnake). A good target for S phasing. Native Data were collected at both home source and SER-CAT synchrotron beam line. Synchrotron Source (1.74Ǻ)Home Cr Source (2.29Ǻ) Automation may help avoid such un-recoverable mistakes that may happen at any step of the complicated process. Reason #3: Even experienced crystallographer may make careless mistakes, too.

7 Automation of Part of the Whole Process from Data Collection to Structure-Solving Feasibility, Current Implementation Structure-Solving Process Data Acquisition & Processing

8

9 1). How to detect and avoid these problems before too late? During data collection, any problem with the diffraction system such as of: X-ray source Shutter Goniometer & Stage Detector Crystal Mounting Other mechanical, optical, electronic defects etc. can ruin the data quality, leading to failure of the whole process.

10 In addition to the unexpected problems, there are many other issues during data collection: 2). Is the diffraction quality is acceptable? 3). Is the data quality still improving? 4). Is the data collected enough to solve the structure? 5). Should continue collecting more frames or better mount another fresh crystal? mount another fresh crystal? All these questions can be answered if and only if we know how to monitor the Signal/Noise ratio during data collection.

11 A New Statistic Index, Ras, to More Objectively and Accurately Evaluate Signal/Noise Ratio  a  I  I  a  a =  I  I  a 1). Fu et al. (2004). Acta Cryst D60:499-506. Signal/Noise ratio 1) Ras  a  c  c  I  I  c  c =  I  I  c Here  a is the ratio of Bijvoet difference and the standard error in intensity, calculated using accentric reflections.  c is  a  c  c is statistically evaluated as  a, but using centric reflections. Theoretically, it should be zero.  c is the counter-part of  a, and thus can serve as the indicator of noise level. Ras, thus defined, can server as a signal/noise ratio in terms of anomalous scattering. The higher the better. Tests show that it is more objective and reliable than other indices currently used for measuring anomalous signal.

12 Signal-based Data Collection with Ras as a reliable indicator, diffraction data can be acquired more appropriately for a given crystal, by monitoring the Signal/Noise ratio through the data collection

13 Structure-Solving Process

14 After data processed, we have to face a set of different issues in the structure-solving process 1). There are numerous programs (or algorithms) to choose. A program may outperform others in some cases and vise versa. A program may outperform others in some cases and vise versa. Which programs to use? Which programs to use? 2). Each program has multiple parameters. Which parameters to adjust? Which parameters to adjust? What combination of the parameters can give the best result? What combination of the parameters can give the best result? 3). If phasing produced a traceable map, is it the best map for you to work on for fitting, refining to complete the structure? work on for fitting, refining to complete the structure?

15 For a given data set, combination of different programs or parameter settings can produce totally different results. Some may succeed to give a solution, but many others will fail. For a given data set, combination of different programs or parameter settings can produce totally different results. Some may succeed to give a solution, but many others will fail 1). Test result on solving the structure of a hydrolase protein (864AAs, 30Se). The 2.8Å data Turner. Test result on solving the structure of a hydrolase protein (864AAs, 30Se). The 2.8Å data was provided by Dr. Turner. Green dots are the percentages of residues automatically traced from maps generated by phasing with different programs (SHELXD, ISAS, SOLVE, RESOLVE) and parameter settings. Pink represents resolution cutoff for heavy atom sites searching. Solid squares indicate SHELXD, while open ones for SOLVE. Blue represents resolution cutoff for phasing and density modification. Solid diamond marker indicate SOLVE/RESOLVE, while open one as ISAS. The Current common Try & Error practice in solving a structureis time-consuming and tedious. It may not give the best solution, and may even fail to find any solution at all for data with marginal quality. The Current common Try & Error practice in solving a structure is time-consuming and tedious. It may not give the best solution, and may even fail to find any solution at all for data with marginal quality. 1). Fu, Rose, Wang (2005): Acta Cryst D61:951-959.

16 Parallel Workflow Engine to systematically search program and parameter spaces to systematically search program and parameter spaces to find the best solution for given data. Figure 1. The dark blocks represent parallel tasks dynamically generated from various crystallographic computing programs with different parameter settings. The tasks are distributed by workflow engine to the computing facility and run parallel. Upon completion, the workflow engine will harvest and analyze the results, and dynamically create and start another group of tasks for the next step. And so on, until the whole process finishes. Fu (2003). Proceeding of the 5th Int. Conference on Mol. Struct. Biology. Vienna, Austria, Sept. 3-7. Fu et al. (2005). Acta Cryst D61:951-959.

17 Algorithm and Design

18 Where are we?

19 Robert Sparks, Acknowledgment George Wu and many Ph.D. students including Dongsheng Che, Jizhen Zhao, Feng Sun, Haijin Yan, Dept. of Computer Sciences, UGA B.C. Wang, John Rose, SER-CAT, SECSG, UGA John Chrzas, Zhongmin Jin, Jim Fait, SER-CAT, APS Andy Howard, Illinois Institute of Technology Robert Sparks, Bruker (formerly Siemens) AXS Inc. Xuong Nguyen-Huu, UC San Diego George Sheldrick, University of Göttingen, Germany. Randy Read, Cambridge University, England Tom Terwilliger, Los Alamos National Lab Peter Briggs (CCP4, England) and Authors of all the programs plugged into SGXPro. Work is supported in part with funds from the National Institute of Health (GM62407) and SERCAT, APS


Download ppt "Automatic Structure Determination --- given a data set, solve the structure quickly and better, by using a parallel workflow engine to automatically and."

Similar presentations


Ads by Google