Presentation is loading. Please wait.

Presentation is loading. Please wait.

Democratization of ‘Omics Data Availability and Review Robert Chalkley UCSF Data Management Editor - MCP.

Similar presentations


Presentation on theme: "Democratization of ‘Omics Data Availability and Review Robert Chalkley UCSF Data Management Editor - MCP."— Presentation transcript:

1 Democratization of ‘Omics Data Availability and Review Robert Chalkley UCSF Data Management Editor - MCP

2 Overview Why should data be shared? What should be shared? Data associated with a publication Who checks the data? What is a publication? How long should data be kept?

3 ‘Scientific journals contain articles that have been peer reviewed, in an attempt to ensure that articles meet the journal's standards of quality, and scientific validity. …The publication of the results of research is an essential part of the scientific method. If authors are describing experiments or calculations, they must supply enough details that an independent researcher could repeat the experiment or calculation to verify the results. Each such journal article becomes part of the permanent scientific record.’ Results published in a journal come with certain guarantees about reliability Results are supposed to exist in perpetuity Results cannot be edited post-publication Can ‘omics publications provide these guarantees? Guaranteeing the Scientific Record

4 Reason for Data Sharing: Publicly Funded Research Most ‘Omics research is funded by government (NIH; EU …) Agencies want maximum ‘bang for their buck (Euro) ‘ Encourage data re-use Results reliability Reviewer generally does not have time to re-evaluate data Hope that others may re-analyze and check/confirm results How would one capture re-analysis information? Is this a new publication?

5 What should be Stored? Metadata Guidelines / journal ensures minimal information is supplied for standard proteomic approaches Experimental description Analysis Parameters

6 2005: Guidelines for minimum information in manuscript (Paris Guidelines) are enforced 2010: Philadelphia Guidelines introduced. Raw data deposition made mandatory. 2011: Peptidome shut down; Tranche undergoing slow death. Moratorium placed on raw data deposition requirement. 2014: Now multiple suitable repositories as part of proteomeXchange. Raw data submission recommended again. 2015: Raw data submission required. MCP and Data Requirement

7

8 What should be Stored? Metadata For data-dependent MS, guidelines are mature. For Targeted MS …

9 What should be Stored? Metadata DIA? Imaging MS? Metabolomics? Lack of detail of metadata is probably the most common limiting factor to effective data re-use.

10 What should be Stored? Results Variety of formats Which formats are acceptable? Do not want to prevent publications because the authors used the ‘wrong’ software Spreadsheet

11 What should be Stored? Results MCP requires annotated spectra for all PTM IDs and all proteins identified by a single peptide. If MCP wants annotated spectra, accepts any format for which annotated spectra can be viewed using free software. http://mcponline.org/site/misc/annotated_spectra.xhtml If a ‘full’ submission to a proteomeXchange repository, then annotated spectra can be provided by the repository. MS-Viewer reads tab-delimited text files We spend a fair amount of time helping authors convert results into a supported format. http://prospector.ucsf.edu/prospector/html/misc/viewereg.htm

12 Annotated spectra from MaxQuant results can be viewed using MaxQuant Viewer or uploading to MS-Viewer MaxQuant requires all result files MS-Viewer requires peak list files and msms.txt file uploaded to MS- Viewer. When MCP authors contact me with MaxQuant results I always suggest both of these options. Every one of the last five authors have chosen MS- Viewer, probably due to file upload time: Recent Example: MaxQuant required 51.7 GB of files MS-Viewer required 2.5 GB of files. Recent Example of MCP Data Submission

13 What should be Stored? Raw Data Usually large Storage space Upload and download time Journal has received very little author resistance to raw data requirement Does all the information need to be captured? Some standard formats; e.g. mzML; mzXML are larger than instrument raw formats For some datasets/studies would a mgf file be sufficient? (usually 10x smaller)? Smaller files are faster to read

14 What is a Publication? How useful is this?

15 What is a Publication?

16 How Long Should Data be Kept? Better instrumentation, better methods At some point it becomes more useful to reacquire data rather than reanalyze old data Some datasets will be downloaded many times; some will never be downloaded Is it necessary to keep all data (online) Journal, as part of the scientific record is supposed to guarantee access to results in perpetuity Is the raw data part of the publication?


Download ppt "Democratization of ‘Omics Data Availability and Review Robert Chalkley UCSF Data Management Editor - MCP."

Similar presentations


Ads by Google