Democratization of ‘Omics Data Availability and Review Robert Chalkley UCSF Data Management Editor - MCP.

Slides:



Advertisements
Similar presentations
NIMAC 2.0 Basics for AUs: Searching, Downloading, and Assigning Files 1www.nimac.us.
Advertisements

© S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.
Electronic Theses and Dissertations: Benefits, Issues, and the University of Waterloo Approach
Selecting a Data Sharing Repository. 2 Why Share Data? Enabling others to replicate and verify results as part of the scientific process Allows researchers.
What happens after submission? Sadeghi Ramin, MD Nuclear Medicine Research Center, Mashhad University of Medical Sciences.
Institutional Repository for CDU What’s in your bottom drawer? Ruth Quinn, Director Library and Information Access Charles Darwin University.
MS-Viewer – A Web Based Spectral Viewer For Database Search Results Peter R. Baker 1, Alma L. Burlingame 1 and Robert J. Chalkley 1 1 Mass Spectrometry.
HOW TO SUBMIT A REVIEW International Journal of Eye Banking.
EBI is an Outstation of the European Molecular Biology Laboratory. PRIDE associated tools: Practical exercise 1 PRIDE team, Proteomics Services Group PANDA.
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
An Open Access publisher’s perspective on data publishing Matthew Cockerill Managing Director, BioMed Central Dryad-UK meeting HEFCE, London, 28 April.
Calice Meeting DESY 13/2/07David Ward Guidelines for CALICE presentations Recently approved by the Steering Committee.
Experimental Psychology PSY 433
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
Software Documentation Written By: Ian Sommerville Presentation By: Stephen Lopez-Couto.
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
Towards an Integrated Transparent Journal Publishing Workflow
ETD Repositories Using DSpace Software Andrew Penman The Robert Gordon University 27 th September 2004.
Data and Publications how to make things better Integration of Research Data and Publications Project ODE – workpackage 4 Eefke Smit International Association.
5. Presentation of experimental results 5.5. Original contribution (paper) - the main outcome of scientific activities - together with patents, they can.
Management, marketing and population of repositories Morag Greig, University of Glasgow.
METADATA Research Data Management. What is metadata? Metadata is additional information that is required to make sense of your files – it’s data about.
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
What are research data? July 2015 This work is licensed under a Creative Commons Attribution 4.0 International LicenseCreative Commons Attribution 4.0.
Research Services Introduction to research data management - a physical science case study Slides provided by DaMaRO Project, University of Oxford.
Libra: Thesis and Dissertation Submission. What is Libra? UVA’s institutional repository, providing online archiving and access for the scholarly output.
RADAR “How To…” Guide DEPOSITING RESEARCH OUTPUTS in RADAR Covered: -Accessing RADAR -Logging in -Depositing outputs -Managing outputs -Uploading documents.
EBI is an Outstation of the European Molecular Biology Laboratory. Proteomics repositories PRIDE team, Proteomics Services Group PANDA group European Bioinformatics.
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
2008 EPA and Partners Metadata Training Program: 2008 CAP Project Geospatial Metadata: Introduction Module 2: FGDC CSDGM Metadata Compliancy.
1 Reportnet for Noise: Feedback from member countries Colin Nugent Eionet National Reference Centres for Noise meeting Copenhagen October 2009.
Open Data Challenges in Interdisciplinary Research Open Access Week October 2012 Jennifer K. Barton, PhD Associate Vice President for Research Professor,
The Written Report: Purpose and Format
Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International.
Data enters Scholarly Communication; how publishers can help make things better Integration of Research Data and Publications Project ODE – workpackage.
Data Standards Submission 1 st CHr-16 Workshop. Miraflores de la Sierra August, 28 th -29 th 2012 Alberto Medina.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Scientific Data and Electronic Publishing Renze Brandsma, Head, Digital Production Centre University of Amsterdam Maarten Hoogerwerf, Project Manager,
Is the project funded by the EPSRC? University policy covering “significant” research data will still apply Will you publish results based on this data?
Advantage of File-oriented system: it provides useful historical information about how data are managed earlier. File-oriented systems create many problems.
Submitting Course Outlines for C-ID Designation Training for Articulation Officers Summer 2012.
5.5. Original contribution (paper) - the main outcome of scientific activities - together with patents, they can not be combined together at one time -
Johannes Griss PSI Meeting Heidelberg, April 2011 EBI is an Outstation of the European Molecular Biology Laboratory. mzTab Proposal for.
CSUN eCommons Submitting Learning Objects to CSUN eCommons: A Preliminary Guide February 7, 2008.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
Uploading LC-MS data to XCMS online Landon S. Wilson Research Associate, TMPL UAB Metabolomics Workshop December 2, 2015.
Article Submission Tutorial Welcome to five-step submission process of SEER (v. 2) Este tutorial foi baseado naquele elaborado por Fernanda Moreno – Consultora.
ARIADNE is funded by the European Commission's Seventh Framework Programme Archiving and Repositories Holly Wright.
Working Group 4 Data and metadata lifecycle management  1. Policies and infrastructure for data and metadata changes  2. Supporting file and data formats.
Brian Hole COASP, Riga, 20 September 2013.
Supporting Information Review & Data Analysis at Organic Letters Angie Hunter Data Analyst, Organic Letters MPS Open Data Workshop – November 2015 American.
REF: Open access requirements Directorate of Academic Support December 2015.
Complying with HEFCE open access requirement for eligibility to REF2021 submission How to create a record in Pure and upload your ‘Author’s Accepted Manuscript’
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
Present apply review Introduce students to a new topic by giving them a set of documents using a variety of formats (e.g. text, video, web link etc.) outlining.
Using RMS to comply with the new REF Open Access Requirement Betsy Fuller Research Repository Librarian Information Services.
 In wikipedia, a peer-reviewed periodical in which academic works relating to a particular academic discipline are published. Academic journals serve.
GNU EPrints 2 Overview Christopher Gutteridge 19 th October 2002 CERN. Geneva, Switzerland.
Scientific Literature and Communication Unit 3- Investigative Biology b) Scientific literature and communication.
Submission Process Submission Requirements
Software Documentation
Plutof Bridge from Data Management to the Reporting (=Publishing)
DIGITAL RESEARCH DATA MANAGEMENT
Role of peer review in journal evaluation
Databases Software This icon indicates the slide contains activities created in Flash. These activities are not editable. For more detailed instructions,
Research Data Management
Peer Feedback More important than technology:
Research data lifecycle²
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Presentation transcript:

Democratization of ‘Omics Data Availability and Review Robert Chalkley UCSF Data Management Editor - MCP

Overview Why should data be shared? What should be shared? Data associated with a publication Who checks the data? What is a publication? How long should data be kept?

‘Scientific journals contain articles that have been peer reviewed, in an attempt to ensure that articles meet the journal's standards of quality, and scientific validity. …The publication of the results of research is an essential part of the scientific method. If authors are describing experiments or calculations, they must supply enough details that an independent researcher could repeat the experiment or calculation to verify the results. Each such journal article becomes part of the permanent scientific record.’ Results published in a journal come with certain guarantees about reliability Results are supposed to exist in perpetuity Results cannot be edited post-publication Can ‘omics publications provide these guarantees? Guaranteeing the Scientific Record

Reason for Data Sharing: Publicly Funded Research Most ‘Omics research is funded by government (NIH; EU …) Agencies want maximum ‘bang for their buck (Euro) ‘ Encourage data re-use Results reliability Reviewer generally does not have time to re-evaluate data Hope that others may re-analyze and check/confirm results How would one capture re-analysis information? Is this a new publication?

What should be Stored? Metadata Guidelines / journal ensures minimal information is supplied for standard proteomic approaches Experimental description Analysis Parameters

2005: Guidelines for minimum information in manuscript (Paris Guidelines) are enforced 2010: Philadelphia Guidelines introduced. Raw data deposition made mandatory. 2011: Peptidome shut down; Tranche undergoing slow death. Moratorium placed on raw data deposition requirement. 2014: Now multiple suitable repositories as part of proteomeXchange. Raw data submission recommended again. 2015: Raw data submission required. MCP and Data Requirement

What should be Stored? Metadata For data-dependent MS, guidelines are mature. For Targeted MS …

What should be Stored? Metadata DIA? Imaging MS? Metabolomics? Lack of detail of metadata is probably the most common limiting factor to effective data re-use.

What should be Stored? Results Variety of formats Which formats are acceptable? Do not want to prevent publications because the authors used the ‘wrong’ software Spreadsheet

What should be Stored? Results MCP requires annotated spectra for all PTM IDs and all proteins identified by a single peptide. If MCP wants annotated spectra, accepts any format for which annotated spectra can be viewed using free software. If a ‘full’ submission to a proteomeXchange repository, then annotated spectra can be provided by the repository. MS-Viewer reads tab-delimited text files We spend a fair amount of time helping authors convert results into a supported format.

Annotated spectra from MaxQuant results can be viewed using MaxQuant Viewer or uploading to MS-Viewer MaxQuant requires all result files MS-Viewer requires peak list files and msms.txt file uploaded to MS- Viewer. When MCP authors contact me with MaxQuant results I always suggest both of these options. Every one of the last five authors have chosen MS- Viewer, probably due to file upload time: Recent Example: MaxQuant required 51.7 GB of files MS-Viewer required 2.5 GB of files. Recent Example of MCP Data Submission

What should be Stored? Raw Data Usually large Storage space Upload and download time Journal has received very little author resistance to raw data requirement Does all the information need to be captured? Some standard formats; e.g. mzML; mzXML are larger than instrument raw formats For some datasets/studies would a mgf file be sufficient? (usually 10x smaller)? Smaller files are faster to read

What is a Publication? How useful is this?

What is a Publication?

How Long Should Data be Kept? Better instrumentation, better methods At some point it becomes more useful to reacquire data rather than reanalyze old data Some datasets will be downloaded many times; some will never be downloaded Is it necessary to keep all data (online) Journal, as part of the scientific record is supposed to guarantee access to results in perpetuity Is the raw data part of the publication?