Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Publication (in H2020)

Similar presentations


Presentation on theme: "Data Publication (in H2020)"— Presentation transcript:

1 Data Publication (in H2020)
Dr Sünje Dallmeier-Tiessen CERN Madrid, November 2016

2 Agenda Introduction Research data Relation to H2020 Data Publishing
Examples, developments and lessons learnt from the “real world” General purpose and disciplinary repositories Adding journals to the mix Adding ”reproducibility workflows” to the mix Lessons learnt

3 Research Data What is it? How does it look like? Does it hurt?

4 Funders’ policies WELCOMES Open Access to scientific publications as the option by default for publishing the results of publicly funded research; […] RECOGNISES that the full scale transition towards Open Access should be based on common principles such as transparency, research integrity, sustainability, fair pricing and economic viability; and […] CALLS on Member States, the Commission and stakeholders to remove financial and legal barriers, and to take the necessary steps for successful implementation in all scientific domains, including specific measures for disciplines where obstacles hinder its progress. See for example:

5 Mandatory Data Management Plans (DMPs)

6 Journals’ policies Springer Nature Data policy

7 Data Publishing Paradigms

8 Data Publishing Concepts
Standalone Data (Repository) Traditional article-data linking Data articles/journals Data Article Data Article Data

9 Data Publishing components (RDA endorsed)
[2] DOI: /s

10 Another data publishing perspective: establishing context
[2] DOI: /s

11 The FAIR Guiding Principles I
To be Findable: F1. (meta)data are assigned a globally unique and persistent identifier F2. data are described with rich metadata (defined by R1 below) F3. metadata clearly and explicitly include the identifier of the data it describes F4. (meta)data are registered or indexed in a searchable resource To be Accessible: A1. (meta)data are retrievable by their identifier using a standardized communications protocol A1.1 the protocol is open, free, and universally implementable A1.2 the protocol allows for an authentication and authorization procedure, where necessary A2. metadata are accessible, even when the data are no longer available FAIR data

12 The FAIR Guiding Principles II
To be Interoperable: I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (meta)data use vocabularies that follow FAIR principles I3. (meta)data include qualified references to other (meta)data To be Reusable: R1. meta(data) are richly described with a plurality of accurate and relevant attributes R1.1. (meta)data are released with a clear and accessible data usage license R1.2. (meta)data are associated with detailed provenance R1.3. (meta)data meet domain-relevant community standards

13 Various solutions Disciplinary and institutional repositories exist
Choose partners: re3data.org Article-Data linking Now easier with Datacite and CrossRef Data/software journals already exist With partner repositories With repository recommendations

14 Data Publishing solutions
Examples, there are more!

15

16 re3data.org

17 Data Publishing Concepts
Standalone Data (Repository) Traditional article-data linking Data articles/journals Data Article Data Article Data

18 All disciplines, institutions
Needs replacement with Zenodo3 Zenodo.org

19 All disciplines, institutions
Figshare screenshot Figshare.com

20 Established discipplinary databases: life sciences
EBI database screenshot

21 Established disciplinary databases: earth & environmental sciences
Pangaea screenshot pangaea.de

22 dataverse.org

23 Data Publishing Concepts
Standalone Data (Repository) Traditional article-data linking Data articles/journals Data Article Data Article Data

24

25 Discipline specific data journals
Add another data journal, e.g. ESSD?

26

27 Considerations for choosing the “right service”
Future purpose: reuse, reproducibility, preservation Metadata (standards) Quality Dependencies (software, methods) Versioning Visibility, Discoverability (cf. FAIR principles) Referencing, data citation capability for all outputs Persistent links, sustainability

28 In practical terms Discuss with researchers
What are the needs of the group/community Are there existing services or is there are need for more? Re3data.org Don’t shy away from contacting data centres or services directly Check out what community publishers do Recommended repositories? Discuss with partners in computer centre and/or community meetings What do they do and plan to do; anything you can contribute to or profit from

29 Moving beyond the individual elements
Opening data publishing to reproducible workflows

30

31

32 Conditions very discipline specific
Reproducibility Repeatability Replicability Reproducibility Reusability Repurposing In order to reuse/repurpose results, you sometimes have to reproduce the original results first (to understand the exact details) An article about computational science in a scientific publication is not the scholarship itself; it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. ( stanford.edu/doku.php?id=sep:research:reproducible :seg92) We can reserve the term "replicability" for the regeneration of published results from author provided code and data Reproducibility is a more general term, implying both replication and the regeneration of findings with at least some independence from the code and/or data associated with the original publication. Both refer to the analysis that occurs after publication. A third term, “repeatability,” is sometimes used in place of reproducibility, but this is more typically used as a term of art referring to the sensitivity of results when underlying measurements are retaken To summarize, we need replicability, in part, to resolve differences in outcomes that arise from reproduced computational results, regardless of whether the experiments have been repeated. Conditions very discipline specific

33 To reproduce or reuse research results a researcher needs…
More than “just” the article Context, documentation Links to related research objects: data, code, workflows Understandable method, processing, software etc. Steps taken during the research process (versions)

34 Research Lifecycle

35 Seamless integration across the research lifecycle
Who? When? Where? ? project/rcn/194927_en.html Slide credit to Trisha Cruse, Datacite

36 https://benchling.com/
Docker

37 Example from CERN: CERN Open Data and CERN Analysis Preservation
Future purpose: reuse, reproducibility, preservation What are the components of an analysis (and where are they stored now) How much do these components vary within the collaboration How is quality defined What are the dependencies (software, methods) Versioning Linking Size (10-15TB per analysis) See CERN presentation later

38 Future Big challenge is adoption  needs all of us to work together
We can help with data curation and services, i.e. guiding researchers to the right services But we need your expertise to make it an intrinsic process for researchers Integrated in publishing process Link objects/resources (DataCite!) Give data more ❤️ and visibility – make it discoverable

39 Backup slides

40 References THOR project; https://project-thor.eu/ ORCID: orcid.org
All icons are kindly provided by freeicon via flaticon

41


Download ppt "Data Publication (in H2020)"

Similar presentations


Ads by Google