Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Publishing Workflows: Strategies and Standards

Similar presentations


Presentation on theme: "Data Publishing Workflows: Strategies and Standards"— Presentation transcript:

1 Data Publishing Workflows: Strategies and Standards
Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows Group

2 Outline Policy pressure Solutions across disciplines Standards
Persistent Identifier Data Citation Quality Assurance, Peer Review Licensing Examples in High-Energy Physics (CERN) INSPIRE Analysis Preservation Framework Open Data Portal

3 Research data is a first class citizen
Royal Society, 1665 and 2012

4 Towards Open Science Open Science Open Data & Code Open Access
Open Source Open Access Open Data & Code Open Science We are here now Slide provided by Patricia Herterich, CERN

5 Policy pressure: STFC example

6 Policy pressure: DOE example
DMPs should provide a plan for making all research data displayed in publications resulting from the proposed research open, machine-readable, and digitally accessible to the public at the time of publication. …the underlying digital research data used to generate the displayed data should be made as accessible as possible to the public in accordance with the principles stated above.

7 Expectations: PLOS Data Policy

8 Concerns across disciplines
Datasets are… Not shared or lost Difficult to discover and access Difficult to understand > context missing Nature, 2009

9 How this challenge is addressed

10 Example: Dedicated Data Repositories

11 Preserving and promoting data reuse

12 International sharing and curation of data
ww.icgc.org

13 ICGC – Data Publication Timeline
Time limits for publication moratoriums: All data shall become free of a publication moratorium when either the data is published by the ICGC member project or one year after a specified quantity of data (e.g. genome dataset from 100 tumors per project) has been released via the ICGC database or other public databases. […] In all cases data shall be free of a publication moratorium two years after its initial release.

14 Zenodo – Data Repository

15 How to find a data repository

16 Example: A dedicated data journal
Nature Scientific Data

17 F1000

18 Connecting articles and data
Tagged Genbank entry (genetic sequence) Slide provided by H. Koers, Elsevier. Article: doi: /j.biortech

19 Towards Open Science Open Science Open Data & Code Open Access
Open Source Open Access Open Data & Code Open Science We are here now Slide provided by Patricia Herterich

20 Publish (Citable) Software

21 More and more examples

22 Published Software Papers

23 Standards

24 Licensing Enable others to reuse your data and software
Choose the licenses or public domain dedications accordingly As “open” as possible Re-Use There are measures to demand citations to track reuse and the impact of your work If you re-use, cite the dataset yourself

25   DOIs for datasets URLs are not persistent
(e.g. Wren JD: URL decay in MEDLINE- a 4-year follow-up study. Bioinformatics , Jun 1;24(11):1381-5). Digital Object Identifiers (DOI names) offer a solution Mostly widely used identifier for scientific articles Researchers, authors, publishers know how to use them Put datasets on the same playing field as articles Dataset Yancheva et al (2007). Analyses on sediment of Lake Maar. PANGAEA. doi: /PANGAEA Slides by courtesy of Dr. Jan Brase, DataCite

26 ORCID id

27 Force11- Data Citation Principles
Author, Publication Year, Dataset Title, Data Repository, Version, Unique Identifier - should include a persistent method for identification that is machine actionable and globally unique - should facilitate identification of, access to, and verification of the specific data that support a claim.

28 Data Citation in Practice

29 Quality assurance for data: peer review
Products Data records in data repositories Data journals Data articles Note: standalone vs. supporting materials QA Workflows Standalone or integrated? Blind and invited peer review Open peer review Citable review reports

30 How to publish your data
Decide which dataset should be preserved or which dataset might be of interest for others to study or reuse Are there issues which restrict the publishing process, e.g. confidentiality for patient data? Which data product? Do I have enough materials for a dedicated data article? Which journal or repository works for me? Prepare the documentation/metadata Publish and let the others know you did Cite the dataset in the resulting papers Track who used and cited your data

31 HEP High-Energy Physics

32 Research data in HEP

33 Research Data on INSPIRE: starting from the paper

34 The underlying datasets (HEPdata)

35 Data Citation (Tracking)

36 Referenced Data arXiv:

37 Code snippets

38 Code snippets

39 … and who gets the credit for sharing data?

40 Kyle’s profile on INSPIRE

41 Using author IDs for attributing credit

42 Excerpt from publication list on

43 Excerpt from publication list on
Make data publications count - alongside your articles

44 Focusing on reproducibility and reuse
Two important new tools

45 Capturing the complexity: Analysis Preservation Framework

46 Open it up: CERN Open Data Portal

47 How to publish your data
Decide which dataset should be preserved or which dataset might be of interest for others to study or reuse Are there issues which restrict the publishing process, e.g. confidentiality for patient data? Which data product? Do I have enough materials for a dedicated data article? Which journal or repository works for me? Prepare the documentation/metadata Publish and let the others know you did Cite the dataset in the resulting papers Track who used and cited your data

48 Conclusions Policy pressure nationally and globally: we need data publishing solutions Considerable advancements in many disciplines  We learn from best practices HEP with commitment to data preservation and open data releases First tools are available to support data preservation and data publishing

49 Towards Open Science Open Science Open Data & Code Open Access
Open Source Open Access Open Data & Code Open Science We are here now Slide provided by Patricia Herterich


Download ppt "Data Publishing Workflows: Strategies and Standards"

Similar presentations


Ads by Google