Presentation is loading. Please wait.

Presentation is loading. Please wait.

Two cases of chemometrics application in protein crystallography European Molecular Biology Laboratory (EMBL), Hamburg, Germany Andrey Bogomolov.

Similar presentations


Presentation on theme: "Two cases of chemometrics application in protein crystallography European Molecular Biology Laboratory (EMBL), Hamburg, Germany Andrey Bogomolov."— Presentation transcript:

1 Two cases of chemometrics application in protein crystallography European Molecular Biology Laboratory (EMBL), Hamburg, Germany Andrey Bogomolov

2 Outline Protein crystallography: a brief introduction Case I: determination of protein secondary structure from the raw diffraction data using PLS-R Case II: modeling of crystal radiation damage Potential applications of chemometric techniques to crystallography (of biological macromolecules)

3 Protein crystallography: introduction Protein (macromolecular) crystallography is a scientific discipline that studies… biological objects: proteins, DNA, RNA etc. … by physical means: X-ray diffraction, synchrotron radiation … on the chemical level: 3D-structure, complexes, interactions … with the extensive use of mathematics: data analysis, modeling The main objectives: solve 3D-structure of a molecule explain its biological function at the atomic level Today’s hot topic: drug design part of the global “-omics” project (genomics/proteomics)

4 Protein crystallography workflow protein (DNA, RNA) solution structure solution data collection crystallization phasing expression& purification

5 Protein crystallography workflow protein crystal structure solution data collection expression& purification phasing crystallization

6 Protein crystallography workflow diffraction pattern structure solution crystallization expression& purification phasing data collection

7 Protein crystallography workflow electron density map structure solution crystallization expression& purification data collection phasing

8 Protein crystallography workflow 3D structure structure solution crystallization expression& purification phasing data collection

9 Protein Data Bank (PDB) Global data collection (>30000 records) www.pdb.org 3D structures experimental data biological and chemical information

10 Crystallographic data collection: Wilson plot X-ray beam experimental theoretical control optimization

11 Case I: Determination of protein secondary structure Problem: determine the contents (fractions of the polypeptide chain) of secondary structure elements in a protein molecule from the raw diffraction data (Wilson plot) well established method for CD and IR spectra of protein solutions PLS regression – one of the best methods Wilson plot: only qualitative data on existing correlation for “theoretical” data α-helix β-sheet

12 Secondary structure determination: data Data Preprocessing: averaging with an optimal bin size* special scaling (correction for anisotropic B-factor)* taking the natural logarithm conversion into the matrix (Wilson plots in rows)* auto-scaling outliers detection and removal* theoretical experimental *) experimental data only

13 Secondary structure determination: data (2) theoretical experimental 1d5t (α+β) 1at0 (β) 1hq3 (α)

14 Secondary structure determination: calibration results 1.S. Navea, R. Tauler, A. de Juan, Elucidation of protein secondary structure, Anal. Biochem. 336 (2005) 231–242 2.K.A. Oberg, J.-M. Ruysschaert, and E. Goormaghtigh, The optimization of protein secondary structure determination with infrared and circular dichroism spectra, Eur. J. Biochem. 271 (2004) 2937-2948 α-helix (theoretical) Element -helix-sheet Theoretical0.062 (0.96)0.060 (0.92) Experimental * 0.112 (0.84)0.081 (0.84) IR/PLS [1]0.078 (0.93)0.075 (0.93) CD/PLS [2]0.077 (0.94)0.092 (0.89) μ: α=0.31, β=0.240.21 (0.00)0.22 (0.00) RMSEP & correlation coefficients for different methods *) Resolution (1/d) = 0.52 Å -1 (~1.9 Å)

15 Case II: Modeling radiation damage Biological crystal exposed to X-rays undergoes radiation damage: Modeling of radiation damage is important understanding of the effect on the protein optimization of data collection Problem present state no comprehensive theory of RD specific effects are well-known, but it the main changes are non- specific Suggestion by Gleb Bourenkov: radiation dose has linear effect on atom’s B-factors Task check for linearity, find reason(s) of deviation

16 Radiation damage modeling: data (trypsin)

17 Radiation damage modeling: results r=0.999 RMSEP=9.4×10 -3

18 Conclusions Multivariate data analysis has a great potential for protein crystallography currently it is application is episodic rarely goes beyond PCA Method-centric approach would be beneficial: “I have a method, I am looking for problems”

19 X-files PCA, Factor Analysis Multivariate Regression MSPC, Design Of Experiment Curve Resolution Multivariate Image Analysis Target Factor Analysis PARAFAC, 3(multi)-way Wavelet Transform SIMCA, PLSD crystallization, HTPC crystal screening crystal auto-mounting data collection data reduction radiation damage phasing structure solution structure refinement

20 Challenge Critical re-assessment of the entire protein crystallographic workflow with multivariate approach in mind – an ambitious project for chemometricians?

21 Acknowledgements Alexander Popov Gleb Bourenkov Victor Lamzin


Download ppt "Two cases of chemometrics application in protein crystallography European Molecular Biology Laboratory (EMBL), Hamburg, Germany Andrey Bogomolov."

Similar presentations


Ads by Google