Presentation on theme: "Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami."— Presentation transcript:
Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami
Protein Structure determination by NMR Sample Preparation Data collection Peak Picking Backbone resonance assignment Sidechain resonance assignment Secondary structure determination NOE data collection and assignment Structure calculation and refinement Individual software packages have been developed for each part but no integrated tool is available for the whole process. Integration needs interaction of individual components Probabilistic framework can provides robust interaction of components Automation in NMR On the average 1-4 months 80k$ per structure 1 2 3
Individual tools developed in CESG and NMRFAM PISTACHIO (Automated resonance assignment) PECAN (Secondary structure determination) MANI-LACS (Reference correction and outlier detection) HIFI-NMR (Fast and adaptive NMR data collection) HIFI-C (Adaptive determination of NMR couplings) 1 Hamid R. Eghbalnia, Arash Bahrami, Liya Wang, Amir Assadi, and John L. Markley (2005) J. Biomol. NMR, 32(3):219-233. 2 Hamid R. Eghbalnia, Liya Wang, Arash Bahrami, Amir Assadi, and John L. Markley (2005) J. Biomol. NMR, 32(1):71-81. 3 Liya Wang, Hamid R. Eghbalnia, Arash Bahrami, and John L. Markley (2005) J. Biomol. NMR, 32(1):13-22. 4 Hamid R. Eghbalnia, Arash Bahrami, Marco Tonelli, Klaus Hallenga, and John L. Markley (2005) J. Am. Chem. Soc., 127(36) 12528 – 12536. 5 Gabriel Cornilescu, Arash Bahrami, Marco Tonelli, John L. Markley, Hamid R. Eghbalnia. (2007) J. Biomol. NMR, 38(4):341-351.
PISTACHIO Native probabilistic PISTACHIO output Residue_Name P(H,N) H N CO CA CB P(H,N) H N P(H,N) H N P(H,N) H N P(no_assignment) 1 MET 0.000 0.000 0.00 0.00 55.29 34.51 0.000 0.000 0.00 0.000 0.000 0.00 0.000 0.000 0.00 0.000 2 ASN 0.730 9.899 125.16 0.00 52.03 40.68 0.210 8.765 123.2 0.000 0.000 0.00 0.000 0.000 0.00 0.060 3 THR 1.000 9.121 116.72 0.00 59.37 63.99 0.000 0.000 0.00 0.000 0.000 0.00 0.000 0.000 0.00 0.000 4 VAL 1.000 7.977 127.97 0.00 61.66 36.07 0.000 0.000 0.00 0.000 0.000 0.00 0.000 0.000 0.00 0.000 5 CYS 1.000 8.310 126.57 0.00 59.14 31.70 0.000 0.000 0.00 0.000 0.000 0.00 0.000 0.000 0.00 0.000 NMR-star format 1 1 MET CA C 55.291 1.000 0 2 1 MET CB C 34.509 1.000 0 3 2 ASN N N 125.160 1.000 0 4 2 ASN H H 9.899 1.000 0 5 2 ASN CA C 52.031 1.000 0 6 2 ASN CB C 40.684 1.000 0 7 3 THR N N 116.723 1.000 0 Overall view of the assignment probabilities PISTACHIO is a probabilistic method for backbone and sidechain assignment. The input to PISTACHIO can be a any subset of following NMR experiments: HSQC HNCO CBCA(CO)NH HN(CA)CB C(CO)NH HBHA(CO)NH HN(CO)CA HN(CA)CO HN(CO)(CA)CB H(CCO)NH HCCH-TOCSY HNCACB HN(CO)CACB HNCA
PECAN Helix Extended PECAN optimizes a combination of information sources to yield energetic descriptions of secondary structure and constructs a probabilistic description wherein each residue is assigned a probability of belonging to a designated state (e.g. helix, sheet, etc.). PECAN is available at: http://www.bija.nmrfam.wisc.edu/PECAN
LACS MANI-LACS3 (Linear Analysis of Chemical Shifts for reference correction and outlier detection) can detect potential outliers using linear analysis of chemical shifts. An outlier may be the result of miss assignment of chemical shifts. MANI-LACS reports probabilities for the presence of outliers. MANI- LACS is available at: http://www.bija.nmrfam.wisc.edu/MANI-LACS/
2D planes of 3D CBCA(CO)NNH experiment collected on 800 MHz Varian Inova spectrometer HIFI-NMR: High-Resolution Iterative Frequency Identification for NMR Tilted-plane reduced dimensionality data collection that employs on-the-fly peak identification, spectral modeling, and selection of the next data plane to be collected.
Simplified Description of the HIFI NMR Approach find a tilt angle that maximizes a dispersion function f (p) Has the last tilted plane added new information ??? YES collect tilted plane X° NO peak list dispersion function, f (p), measures the dispersion of the putative peaks on the selected tilted plane orthogonal planes 0° 90° predicted chemical shift distribution assign a probability of a peak being in a given voxel, p probability color map
HIFI application to automated backbone assignments HIFI - Data collection time PINE – Assignment time Assignment accuracy WT Brazzein 53 a.a. 12h5m98% Ubiquitin 76 a.a. 14h5m98% Flavodoxin 176 a.a. 48h2h85%
HIFI–C: A Fast and Robust Method for Determining NMR Couplings from Adaptive 3D to 2D Projections Correlation and RMSD comparison of couplings collected by HIFI-C and 3D. Agreement between the two was within experimental error. (A) GB3 protein (R = 99.8%, rmsd = 0.03 Hz). The total data collection times were 1.7 h for HIFI-C and 7.9 h for 3D. (B) PRP24-12 protein (R = 94.0%, rmsd = 0.25 Hz). The total data collection times were 14.6 h for HIFI-C and 44.1 h for 3D.
HIFI-NMR PISTACHIO PECANMANI-LACS HIFI-C Back to Automation Steps in NMR Proteomics
Redesign the Individual Tools to Provide Robust Probabilistic Interaction: PINE MANI-LACS PISTACHIO PECAN PINE
General Overview of Probabilistic Network Defined by PINE
Amino Acid Typing Network Spin System Generation Network
Table 1. PINE performance result and comparison with PISTACHIO for the proteins that BMRB assignment are available. Protein designator Number of Residue PINEPISTACHIO Experiments represented in the input peak lists‡ CPU time (h) Assignment accuracy* Secondary structure accuracy CPU time (h) Assignment accuracy* 12345678 At2g249401090.298%95%1 ** At1g775401030.296%94%0.295%* At2g23090860.2100%92%0.198% AAH269941010.295%97%0.290%*** At5g22580111195%90%588%*** At3g17210112194%90%6 ****** At3g51030124194%88%587%****** At5g01610170180%83%6 70%*** At3g16450†2991.582%NA773%******* BMRB 5106700.295%90%195%** * Correct assignments is final structure and assignment deposited on PDB and BMRB † Stereo array isotope labeled (SAIL) protein; isotope shifts due to labeling were not accounted for. ‡ Each data set included an HSQC or HNCO experiment; other experiments are indicated by numbers: 1 CBCA(CO)NH or HN(CO)CACB2 HNCACB 3 HNCA4 HN(CO)CA or CA(CO)NH 5 HN(CA)CO6 H(CCO)NH or N15 TOCSY 7 C(CO)NH8 HBHA(CO)NH
PINE Server Statistics Total Number of jobs submitted since July 2006: 1175 jobs
Iterative HCCH-TOCSY assignment HBHA(CO)NH C(CO)NH H (CCO)NH HCCH-TOCSY
PINE, HIFI and Time Saving in NMR Proteomics Time SavingAccuracyMain cause of possible inaccuracy What may need to be done manually HIFI12 hours – 2 days data collection VS 1 week – 2 weeks traditional methods 95%-100% peaks recovered with high probability depends on the size and the complexity of protein. Some of the peaks may have very low intensities (in the noise level). They will have lower probabilities in the final peak list. Manual analysis maybe needed to derive the remaining peaks from the lower probability list. PINEFull Assignment in anytime between 5 min – 2 hours VS 1 week – 1 month manual assignment 85%-100% correct assignment depends on the size and the complexity of protein. Some of the real peak are missing in the peak lists. Manual assignment of the remaining peaks can be easily done by scanning the spectra.
HIFI-NMR Fast data collection and peak identification Referencing and outlier check Automated assignment Secondary structure determination PISTACHIO MANI-LACSPECAN PINE On going project: Integration of HIFI and PINE
(A) HNCA (HC plane) 512 zero filling; 0.15 delay in sine window function (B) HNCA (HC plane) 1024 zero filling; 0.45 delay in sine window function (C) Difference between spectra (A) and (B) XYProbability 402270.9846 562310.9844 725950.9846 892450.7622 1024030.6541 1103800.9851 119840.2486 1283590.9871 1305110.4452 ……… (D) Probabilistic peak lists are generated for every plane based on different parameter settings and peaks volume. Probabilistic Analysis of Spectra in HIFI
Find the optimum experiment and tilted angle The optimum is the plane that maximizes the information regarding the ambiguous or missing position in spin systems considering latest state of chemical shift assignment. YES collect the optimal tilted or orthogonal plane X° Report the final peak lists, chemical shift assignments, and secondary structure Collect N15-HSQC Predicted chemical shift distribution Spectra Analysis Generate probabilistic peak list Derive the initial probabilistic spin systems Spectra Analysis: Generate probabilistic peak list Update the probabilistic spin system Is the spin system network quality good enough for the assignment process? PINE Derive the latest assignment and secondary structure Are the assignment and secondary structure complete? Collect the most sensitive orthogonal plane 0° YES NO
HIFI-NMR Fast data collection and peak identification Referencing and outlier check Automated assignment Secondary structure determination PISTACHIO MANI-LACSPECAN NOESY Assignment PINE
Acknowledgements John Markley Hamid Eghbalnia Marco Tonelli All CESG member providing data: Claudia Cornilescu Shanteri Singh Jikui Song Brian Volkman Francis Peterson Ziqi Dai Gabriel Cornislescu Klaus Hallenga Milo Westler Liya Wang Eldon Ulrich