Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Archiving and Networked Services DANS is an institute of KNAW en NWO Data Archiving and Networked Services Introduction to Data Management Planning.

Similar presentations


Presentation on theme: "Data Archiving and Networked Services DANS is an institute of KNAW en NWO Data Archiving and Networked Services Introduction to Data Management Planning."— Presentation transcript:

1 Data Archiving and Networked Services DANS is an institute of KNAW en NWO Data Archiving and Networked Services Introduction to Data Management Planning Peter Doorn, DANS EGI Community Workshop focusing “Managing, computing and preserving big data for research” Amsterdam, 5 th March 2014

2 What is data management? Data management is a general term covering how you organize, structure, store, and care for the information used or generated during a research project It includes: – How you deal with information on a day-to-day basis over the lifetime of a project – What happens to data in the longer term – what you do with it after the project concludes Acronyms: – DMP = Data Management Plan – RDM = Research Data Management First 3 slides based on “Research Data Management Training Materials”, University of Oxford,http://damaro.oucs.ox.ac.uk/training_materials.xmlhttp://damaro.oucs.ox.ac.uk/training_materials.xml

3 Why spend time and effort on this? So you can work efficiently and effectively – Save time and reduce frustration – Highlight patterns or connections that might otherwise be missed Because your data is precious To enable data re-use and sharing To meet funders’ and institutional requirements: good data management is becoming a standard research practice (research integrity, code of conduct)

4 Funders’ requirements Funding bodies are taking an increasing interest in what happens to research data You may be required to make your data publicly available at the end of a project – Check the small print in your grant conditions Many funders require a data management plan as part of grant applications Page 4

5 Data fraud front page news in April 2013 The Mind of a Con Man

6 Niederlande Renommierter Psychologe gesteht Fälschungen September 2011: Diederik Stapel, Social Psychology November 2011: Don Poldermans, Cardiovascular Medicine June 2012: Dirk Smeesters, Experimental Social Psychology October 2012: Mart Bax, Cultural Anthropology

7 The KNAW “Schuyt report” on data practices A lot of variation across and within disciplines Pattern: data management in small-scale research more risky than in big science Risk: missing checks and balances, especially in phase after granting a research proposal and before publication Peer pressure is an important mechanism of checks and balances http://www.knaw.nl/Content/ Internet_KNAW/publicaties/p df/20131009.pdf

8 Increased awareness of need for Data Policies Dutch Academy (KNAW) “supports the free movement of data and results. Taking into account variations across and within scientific disciplines, free availability of data should be the default”. Dutch Universities are developing data policies Dutch research funding organisation NWO: Data Management Plans (DMPs) and data sharing are becoming requirements for funding Science Europe: Data Access Working Group proposes DMPs for member research funders EU research funding programme Horizon 2020: Guidelines on data management EU Vice-President Neelie Kroes: “Data is the new gold” (Riding the Wave report, 2010)

9 Life cycle of research data

10 http://sim4rdm.eu/docs/project- outputs/sim4rdm-landscape-report The SIM4RDM project strives for better policies in the area of managing research data on both the national and European level. The vision of SIM4RDM is to enable researchers to take full advantage of emerging data infrastructures in the European Research Area (ERA) SIM4RDM is a two year ERA-NET project funded by the European Commission's Seventh Framework Programme (FP7).

11 Data Management Policies in the United States The Office of Management and Budget sets forth standards for obtaining consistency and uniformity among federal agencies in the administration of grants to and agreements with institutions of higher education, hospitals, and other non-profit organizations. A number of US funding agencies have drawn up data management policies based on that circular. Since 2011, the National Science Foundation in the United States has made a data management plan mandatory when submitting grants proposals. Proposals must now include a supplementary document of no more than two pages labelled ‘Data Management Plan’ which should include the following information:

12 NSF requirements of DMPs Products of the Research: The types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project. Data Formats: The standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate this should be documented along with any proposed solutions or remedies). Access to Data, Data Sharing Practices and Policies: Policies for access and sharing, including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements. Policies for Re-Use, Re-Distribution and Production of Derivatives Archiving of Data: Plans for archiving data, samples, other research products, and for preservation of access to them

13 Celina Ramjoué, Head of Sector, Digital Science Unit, CONNECT.C3 presentation on “Open Access to Scientific Publications and Data”

14 H2020 Guidelines on RDM Annex 1: Data Management Plan (DMP) template The purpose of the Data Management Plan (DMP) is to provide an analysis of the main elements of the data management policy that will be used by the applicants with regard to all the datasets that will be generated by the project. The DMP is not a fixed document, but evolves during the lifespan of the project. The DMP should address the points below on a dataset by dataset basis and should reflect the current status of reflection within the consortium about the data that will be produced. http://ec.europa.eu/research/participants/data/ref/h2020/grants _manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf

15 Data set reference and name: Identifier for the data set to be produced. Data set description: Description of the data that will be generated or collected, its origin (in case it is collected), nature and scale and to whom it could be useful, and whether it underpins a scientific publication. Information on the existence (or not) of similar data and the possibilities for integration and reuse. Standards and metadata: Reference to existing suitable standards of the discipline. If these do not exist, an outline on how and what metadata will be created. Data sharing: Description of how data will be shared, including access procedures, embargo periods (if any), outlines of technical mechanisms for dissemination and necessary software and other tools for enabling re-use, and definition of whether access will be widely open or restricted to specific groups. Identification of the repository where data will be stored, if already existing and identified, indicating in particular the type of repository (institutional, standard repository for the discipline, etc.). In case the dataset cannot be shared, the reasons for this should be mentioned (e.g. ethical, rules of personal data, intellectual property, commercial, privacy-related, security-related). Archiving and preservation (including storage and backup): Description of the procedures that will be put in place for long-term preservation of the data. Indication of how long the data should be preserved, what is its approximated end volume, what the associated costs are and how these are planned to be covered.

16 Annex 2: Additional guidance for Data Management Plans Scientific research data should be easily: 1.Discoverable: are the data and associated software produced and/or used in the project discoverable (and readily located), identifiable by means of a standard identification mechanism (e.g. Digital Object Identifier)? 2.Accessible: are the data and associated software produced and/or used in the project accessible and in what modalities, scope, licenses (e.g. licencing framework for research and education, embargo periods, commercial exploitation, etc.)? 3.Assessable and intelligible: are the data and associated software produced and/or used in the project assessable for and intelligible to third parties in contexts such as scientific scrutiny and peer review (e.g. are the minimal datasets handled together with scientific papers for the purpose of peer review, are data is provided in a way that judgments can be made about their reliability and the competence of those who created them)?

17 Annex 2: Additional guidance for DMPs (cont’d) 4.Useable beyond the original purpose for which it was collected: are the data and associated software produced and/or used in the project useable by third parties even long time after the collection of the data (e.g. is the data safely stored in certified repositories for long term preservation and curation; is it stored together with the minimum software, metadata and documentation to make it useful; is the data useful for the wider public needs and usable for the likely purposes of non-specialists)? 5.Interoperable to specific quality standards: are the data and associated software produced and/or used in the project interoperable allowing data exchange between researchers, institutions, organisations, countries, etc. (e.g. adhering to standards for data annotation, data exchange, compliant with available software applications, and allowing recombinations with different datasets from different origins)?

18 https://dmp.cdlib.org/

19

20 https://dmponline.dcc.ac.uk/

21 DANS data management plan brochure http://www.dans.knaw.nl/en/content/data-management-plan

22 Preferred formats DANS Preferred and Accepted File Formats DANS guarantees that the files, deposited in a format as mentioned in the list below, will be accessible over time. It is also possible to deposit other file types, but we advise you to contact one of our data managers first at info@dans.knaw.nlinfo@dans.knaw.nl The preferred and accepted formats include the following types of files: 1. Audio-visual files 2. Text files a.Fixed and reusable text, presentations b.Plain text c.Mark-up 3. Spreadsheets 4. Statistic files 5. Databases 6. Cartographic data (CAD) (DWG, DXF) 7. Geographic Information System (GIS) (TAB, SHP)

23 Final remarks & some questions DMP will be mandatory Agreement on basic principles Variations in the degree of detail: – depending on scientific domain, size of project, funding instrument/organisation, institution… Procedures are under construction – data management section in proposal stage – data management plan as part of contract or in first phase of project? – DMP as a living document? Not just a paper (or digital) tiger… – who will check? Sanctions? To what degree will RDM be fundable?

24 Data Archiving and Networked Services DANS is an institute of KNAW en NWO Thank you for your attention www.dans.knaw.nl www.narcis.nl peter.doorn@dans.knaw.nl


Download ppt "Data Archiving and Networked Services DANS is an institute of KNAW en NWO Data Archiving and Networked Services Introduction to Data Management Planning."

Similar presentations


Ads by Google