PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader.

Slides:



Advertisements
Similar presentations
SG KB 2009 NIGMS Workshop: Enabling Technologies for Structural Biology Section on Structural Analysis Margaret J. Gabanyi March 4, 2009 How to Use the.
Advertisements

PepcDB Reporting at CESG: More Trials and Fewer Tribulations PPCW Bottlenecks Meeting 20 March 2007 Craig A. Bingman (U54 GM
High Throughput Protein Domain Elucidation by Limited Proteolysis-Mass Spectrometry Jeff Bonanno and Xia Gao Structural GenomiX, Inc.
Integration of Mass Spectrometry with High- throughput Protein Crystallography Tarun Gheyi SGX Pharmaceuticals, Inc. / NYSGXRC April 14, 2008.
Cloning and Characterization of the MaxiK Ion Channel Promoter of Xenopus Vanessa Provencio Dr. Elba Serrano Biology Department.
A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae Article by Peter Uetz, et.al. Presented by Kerstin Obando.
2004 PP&CW Optimization of protein expression and solubility Alternative and novel prokaryotic expression systems Eukaryotic expression systems Methods.
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
MCB 130L Lecture 1: DNA.
Tricks and improvements in Structural Genomics Chantal Abergel Structural and Genomic Information Laboratory UMR 2589-CNRS, IBSM, 31 chemin J. Aiguier,
Plasmid purification lab
Deconstruction of Drop Volume Ratio/Temperature Optimization Experiments Joseph R. Luft, Edward H. Snell, Jennifer R. Wolfley, Meriem I. Said, Ann M. Wojtaszcayk,
Overview of Bindley Bioscience Center Protein Production Lab: experimental capabilities. Contact Bindley Biosceince Center, room 222 Phone
The Use of Empirically Derived Detergent Phase Boundaries to Crystallize Membrane Proteins M. Koszelak-Rosenblum, A. Krol, N. Mozumdar, K. Wunsch, A. Ferin,
PIMS: The Problems of Project Management Robert Esnouf, Scientific Sponsor for PIMS OPPF/STRUBI, University of Oxford strubi.ox.ac.uk.
WebGBrowse A Web Server for GBrowse Configuration Ram Podicheti B.V.Sc. & A.H. (D.V.M.), M.S. Staff Scientist – Bioinformatics Center for Genomics and.
23 May June May 2002 From genes to drugs via crystallography 19 May 1996 Experimental and computational approaches to structure based.
Protein Production for Structure-Based Drug Design Stephen Chambers ~ Head of Gene Expression Vertex Pharmaceuticals Incorporated NIGMS 2004 PSI Protein.
Protein analysis and proteomics (Part 2 of 2). Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan.
Workflow of SeMet Protein Preparation Yingyi Fang Haleema Janjua.
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
DOE Resources & Facilities for Biological Discovery : Realizing the Potential Presentation to the BERAC 25 April 2002.
Workflow Analysis for the Northeast Structural Genomics Consortium at the CABM/Rutgers University/RWJMS Protein Production Facility October 22, 2002 Celia.
STRUCTURE PECULIARITIES OF α- CRYSTALLIN STUDIED BY SMALL ANGLE NEUTRON AND X-RAY SCATTERING T.N. Murugova 1, O.I. Ivankov 1,5, A.I. Kuklin 1,3, K.O. Muranov.
Helen M. Berman, Rutgers University EMBO Practical Course Section: Searching Structure Databases September 26, 2008 PSI Structural Genomics Knowledgebase.
HTP Construct Optimization using Bioinformatics Coupled with Amide Hydrogen Deuterium Exchange (DXMS) and HTP NMR screening Yuanpeng (Janet) Huang Northeast.
Data and Dissemination Core 1. Overview and EFI Website – Heidi Imker, UIUC 2. EFI LabDB LIMS – Wladek Minor, UVA 3. SFLD – Patsy Babbitt, UCSF (post lunch)
Rastering strategy for screening and centring of microcrystal samples of human membrane proteins with a sub-10 µm size X-ray synchrotron beam by Vadim.
Topic 2 John Markley. Task: choice of targets that meet selection criteria and are likely to yield structures Models from sequences: ORFs, intron/exon.
1 Human metabotropic glutamate receptor 6: Expression and purification Kalyan Tirupula Graduate Student JKS Lab, UPitt.
Structural Biology and Genomics Platform Didier Busso - April 26, 2007 Platform’s Technical Coordinator
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
Valentina Di Francesco Senior Program Officer for Bioinformatics, Structural Genomics and Systems Biology Microbial Genomics.
Heterologous Protein Expression in Yeast CoHo7e - Green, Core and HA Malcolm Stratford & Hazel Steels MOLOGIC.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Presented by Subproject 6 of the MPEC Roadmap Grant P50 GM Membrane Protein Production for Crystallization Goals, Approach, Progress and Examples.
Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense.
1 Epidermal Growth Factor Receptor (EGFR) the transmembrane + juxtamembrane domains L1CR1L2CR2 JM KinaseCT Extracellular portionIntracellular.
© © U.S. Patents #7,129,091, #6,409,832 and Patents Pending 1 The Microcapillary Protein.
Optimizing Purification and Quantitative Detection of Lysozyme and Avidin Tatiana Soboleva, Ryan Colakovic, Dr. Theresa Salerno (Faculty Mentor)
Molecular Cloning.
Workflow of the Manual Purification of N/NC5-enriched proteins
1 Workflow Analysis of the Protein Purification Process of SeMet Labeled Proteins September 30, 2005 Haleema Janjua.
Taylor Bendt Faculty advisor: Dr. Gary Merrill. DNA Damage p53 DNA repairApoptosisp21 Cell cycle arrest Genome maintenance  Important for cancer prevention.
Copyright OpenHelix. No use or reproduction without express written consent1.
Plasmid Isolation Prepared by Latifa Aljebali Office: Building 5, 3 rd floor, 5T250.
Rochester Data in Sesame Logging in to Genie to handle 96-well plates
Data, Meta-Data and Documents in Ginas,. Data and Documents Data related to substances should be organized across manufacturers, by manufacturer and even.
Figure 5: Expression and solubility tests for constructs of CoVs. Coronaviruses are complex, positive-sense RNA viruses that cause mild to severe respiratory.
SG KB 2009 NIGMS Workshop: Enabling Technologies for Structural Biology Section on Structural Analysis Helen M. Berman March 4, 2009 How to use the PSI.
Bethesda, March 4 th 2009 Semi-automatic structure solution with HKL-3000 Structural Biology.
Protein Purification for Crystallization Dr Muhammad Imran Forman Christian College (A Chartered University) Dr Muhammad Imran Forman Christian College.
Cloning, Over-expression and Purification of NanoLuc Luciferase
High throughput biology data management and data intensive computing drivers George Michaels.
PSI Materials Repository S torage and distribution of materials generated by PSI centers PSI Materials Repository S torage and distribution of materials.
Bringing structural biology services through collaboration.
 Facilities Open House Functional Genomics Facility Molishree Joshi, Ph.D. 6/1/2015 Contact Information:
Ian Barr Feigon Lab, UCLA Chemistry Progress report: Shq1-CS domain; TER Stem Loop IV.
Getting the Most out of the PDBe
Protein structure Our understanding of life at the molecular level is highly dependent on the ability to map the molecular details of individual proteins.
OVEREXPRESSION OF TRUNCATED ARA H2
José Antonio Agüero-Fernández1† and José Pérez-Casal2
Protein Sample Preparation For NMR Screening
(a) (b) FhGALE M U I S W1 W2 E1 E2 E3 M - +
Epidermal Growth Factor Receptor (EGFR) the transmembrane + juxtamembrane domains CR1 L2 CR2 JM Kinase CT Extracellular.
TargetDB and PEPCDB •
Frank R. Collart Midwest Center for Structural Genomics
Fluorescence-Detection Size-Exclusion Chromatography for Precrystallization Screening of Integral Membrane Proteins  Toshimitsu Kawate, Eric Gouaux  Structure 
Volume 54, Issue 4, Pages (May 2007)
The Systematic Production of Cells for Cell Therapies
Presentation transcript:

PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

NIGMS Expectations “… a database for deposition of information on experimental outcome data (both successful and unsuccessful). “These data include … cDNA cloning, expression vector construction, protein production and purification, protein biochemical characterizations, crystallization screening, synchrotron and NMR data collection, etc. “The PSI Research Network centers will be required to provide plans for the collection, maintenance, and transfer of experimental results into this central data repository. PepcDB… will contain information on these important results and provide a platform for cross-center data mining to capitalize on the PSI investment

Protocols vs Results General protocols are reported by each PSI Center in PepcDB General protocols have been published in the literature by several Centers However, one of the real values of PepcDB lies in the detailed experimental trial results for each target –Which clones were made? (PSI-MR) –Which constructs yield soluble protein? (which don’t?) –What are the fermentation conditions? Purification? –What was the protein yield? The final concentration? The experimental molecular weight? –What conditions gave crystals? How many crystal forms? What was the cryoprotectant? Which conditions led to diffraction data? To the structure?

TargetDB/PepcDB Data Mining TargetDB status is informative, but far more useful would be data about –Small scale expression/solubility testing –Large scale purification yield, concentration, oligomeric state –Conditions that yielded diffracting crystals Publications –Overton et al (2008) Bioinformatics 24: “ParCrys: a Parzen window density estimation approach to protein crystallization propensity prediction” (PDB, TargetDB, PepcDB) –Martin-Galiano et al (2008) Proteins 70: “Predicting experimental properties of integral membrane proteins by a naive Bayes approach” (TargetDB) –Bannen et al (2007) J Struct Funct Genomics 8: “Effect of low-complexity regions on protein structure determination” (TargetDB/PepcDB) –Smialowski et al (2007) Bioinformatics 23: “Protein solubility: sequence based prediction and experimental verification” (TargetDB) –Slabinski et al (2007) Bioinformatics 23: “XtalPred: a web server for prediction of protein crystallizability” (TargetDB) –Nair & Rost (2004) Nucl Acids Res 32:W517-W521 “LOCnet and LOCtarget: sub- cellular localization for structural genomics targets” (TargetDB)

Process vs Reporting 0110 SelectedMol biol in progress 140 Fail PCR Cloning failed Failed expresn Failed solubility Fermentation on hold 10 Active 365 Purification on hold Purified; completed to collaborator Purification research unsuccessful Cryst in screening Crystallization admitted 210 Failed transform 270 Clone completed to ferm 310 Fermentation voided 320 Fermentation waiting 370 Purification waiting 390 Purification in progress 430 Purification technical error 440 Purification failed 460 Purification research marginal 470 Purification research successful 645 Cryst in optimization 650 Screening grainy ppt Optimization grainy ppt Optimization microcrystals Optimization crystals Crystal abandoned Crystal examined Crystal waiting collection Dataset collected 950 Structure deposited Structure ClonedExpressedSoluble Purified CrystallizedDiffr dataIn PDB Selected

Need to Consider the Future… Now How much data are we capturing in our databases compared to how much we are reporting? What will happen to Center data after PSI-2? We should ensure that as much as possible of our Center data is publicly accessible in PepcDB

Trial Data Reporting by Center CenterExperimental trial details reported to PepcDB JCSGProtein sequence, cloning vector, fermentation media, purification method, crystallization conditions MCSGProtein sequence, cloning vector, expression host, temperature, media NESGCProtein sequence NYSGXRCDNA and protein sequence, construct boundaries, cloning vector, small scale expression/solubility scores, media, MW, large scale media, volume, induction time/temp, pellet weight, harvest date, SeMet Y/N, purification yield, concentration, purity, MW, oligomeric state, start/end dates, mass spec pass/fail, analysis comments, MW, crystallization conditions, protein concentration, temperature, cryo, harvest/collection dates, anomalous scatterer, diffraction resolution

PepcDB Trial Schema

NYSGXRC SGX_MOLBIO_PCR ### Molecular Biology - PCR #### PCR start date: 03/20/2007 PCR last updated: 04/16/2007 Notebook #: 1358 Page: 13 SGX_MOLBIO_TOPO_TRANSFORM ### Molecular Biology - cloning #### SGX clonename: 10001b2BSt5p1 Vector: pSGX4 (BS) SGX_MOLBIO_EXPR_SOL ### Small scale expression/solubility ### Expression score: HIGH Solubility rating: HIGH Predicted molecular weight (kDa): Growth Media (small scale): ZYP-5052 Observed molecular Weight (kDa): 46 Sonication buffer: PLB1 SGX_FERM_ECOLI_ZYP ### Fermentation ### SGX PID: Growth Media (large scale): ZYP-5052 Total volume (L): 1 Induction time (hr): 21 Induction temp. (C): 22 Pellet weight (g): 19 Harvest date: 05/17/2006 Selenomet: N SGX_PURIF_ECOLI_BACT ### Purification ### SGX PID: SGX pool: 1 Selenomet: N Start date: 06/21/2006 Yield (mg): 52.3 Final concentration (mg/ml): 52.3 Observed molecular weight (kDa): 33 Notebook #: 1136 Page: 115 End date: 06/23/2006 Purity (%): 98 Oligomeric state: monomer (1 subunit) DNA source? Primers? Host cells? Antibiotic resistance? Purification steps? Buffers?

NYSGXRC SGX_MALDI ### Mass Spec - MALDI ### Mass Spec Status: Passed SGX_ESI-MS ### Mass Spec - ESI-MS ### Mass Spec Status: Passed Observed MW: SGX_XTAL ### Crystallization ### SGX XID: Tray barcode: N Temperature: 21 Protein concentration (mg/ml): 26 Well location: G 12 Well conditions: [100mM] 1M Hepes pH [25%] 50% PEG [200mM] 1M Magnesium Chloride hexahydrate Cryoprotectant comment: [20%] 80% Glycerol Harvest date: 09/05/2006 Collection date: 09/05/2006 APS resolution: 2.3 Crystal status: D-DATASET COLLECTED Crystal morphology? Space group?

Proposed Data Reporting Molecular biology –DNA source, primers, vector, PSI-MR clone ID, Host, antibiotic resistance –Expression and solubility rating (small scale), media, predicted and observed molecular weight Fermentation –Media, volume, induction time, temp, selenoMet? Purification –Purification steps, final buffer, yield, concentration, molecular weight, purity, oligomeric state –Accurate MW if mass spec done Crystallization –Temperature, protein concentration, well conditions, cryoprotectant and resolution, if applicable

Alternative mechanism to report experimental data – molecular weight – – Da Examples –Molecular weight –Isoelectric point –Phosphorylation –Methylation –Element analysis / stoichiometry –etc.

Optional tags PDB-proposed mmCIF-like tags to describe cloning, expression, purification, crystallization, etc. Examples –_entity_src_gen_pure.protein_concentration –_entity_src_gen_pure.protein_yield –_entity_src_gen_pure.protein_oligomeric_state –_pdbx_buffer_components.name –_pdbx_buffer_components.conc –_exptl_crystal_grow.temp

Recommendation NYSGXRC plans to further improve our reporting of trial results in 2008 We encourage all PSI Centers to utilize the PepcDB or tags to report as much experimental trial results as possible in their PepcDB XML updates See associated poster

Acknowledgements SGX LIMS development team –Ryan Allis –Chris Hansen –Peter Hillier –Ken Schwinn AECOM - Veena Venkatagiriyappa (Fiser lab) Andrei Kouranov (PDB) LIMS improvements suggested by SGX protein production, crystallization, and beamline staff This work was supported by SGX Pharmaceuticals, Inc., and NIH Grant U54 GM074945