Presentation on theme: "Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits."— Presentation transcript:
Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits and funder requirements University of Stirling 25 March 2013
Running order I.DEFINITIONS II.DRIVERS III.RULES AND (IN)EQUATIONS - Group Exercise (30 mins)
-Digital Curation Centre, est. 2004 -Three partners: Edinburgh, Glasgow and Bath -Primary funder is JISC Helping to build capacity, capability and skills in data management and curation across the UK’s higher education research community -DCC Phase 3 Business Plan www.dcc.ac.uk
5 What Kinds of Data? …whatever is produced in research or evidences its outputs What is Research Data? Facts Statistics Qualitative Quantitative Unpublished research outputs Discipline specific
6 “Data underpins our economy and our society - data about how much is being spent and where, data about how schools, hospitals and police are performing, data about where things are and data about the weather.” Tim Berners Lee, director of W3C. A Data Gift?
“the active management and appraisal of data over the lifecycle of scholarly and scientific interest” Data management is a part of good research practice What is Research Data Management?
Data is (usually) central to the process The six datacentric phases of the research lifecycle
Why manage HE research data? -Research integrity (defend findings) -Research impact (linking data and publication, making data citable) -Supports / enables reuse, which keeps funders happy -Maximises value and increases ROI, which keeps govt happy -Helps to meet regulatory requirements -Can control costs (via capacity planning etc)
Attitudes / approaches -The term “research data” means different things to different people in HE -Researchers may care enormously about their data, so much so that they worry about it going out into the world on its own -Others (e.g. those with responsibility for compliance) may worry about it not going out into the world, or going out when it shouldn’t / underdressed -Some may not recognise the relevance of ‘data’ in what they do…
“While many researchers are positive about sharing data in principle, they are almost universally reluctant in practice...... using these data to publish results before anyone else is the primary way of gaining prestige in nearly all disciplines.” INCREMENTAL Project “Data sharing was more readily discussed by early career researchers.”
Open to all? Case studies of openness in research Choices are made according to context, with degrees of openness reached according to: The kinds of data to be made available The stage in the research process The groups to whom data will be made available On what terms and conditions it will be provided Default position of most: YES to protocols, software, analysis tools, methods and techniques NO to making research data content freely available to everyone Angus Whyte, RIN/NESTA, 2010
“Surfing the Tsunami” Science: 11 February 2011 The data deluge
Public good Preservation Discovery Confidentiality First use Recognition Public funding
RCUK Policy and Code of Conduct on the Governance of Good Research Conduct Unacceptable research conduct includes mismanagement or inadequate preservation of data and/or primary materials, including failure to: – keep clear and accurate records of the research procedures followed and the results obtained, including interim results; – hold records securely in paper or electronic form; – make relevant primary data and research evidence accessible to others for reasonable periods after the completion of the research: data should normally be preserved and accessible for 10 yrs (in some cases 20 yrs or longer); – manage data according to the research funder’s data policy and all relevant legislation; – wherever possible, deposit data permanently within a national collection. Responsibility for proper management and preservation of data and primary materials is shared between the researcher and the research organisation.
http://www.epsrc.ac.uk/about/standards/researchdata/Pages/expectations.aspx April 2011 - EPSRC Letter to VCs EPSRC expects all those institutions it funds: -to develop a roadmap that aligns their policies and processes with EPSRC’s expectations by 1 st May 2012 -to be fully compliant with these expectations by 1 st May 2015
6.9 The Research Councils expect the researchers they fund to deposit published articles or conference proceedings in an open access repository at or around the time of publication. But this practice is unevenly enforced. Therefore, as an immediate step, we have asked the Research Councils to ensure the researchers they fund fulfil the current requirements. Additionally, the Research Councils have now agreed to invest £2 million in the development, by 2013, of a UK ‘Gateway to Research’. In the first instance this will allow ready access to Research Council funded research information and related data but it will be designed so that it can also include research funded by others in due course. The Research Councils will work with their partners and users to ensure information is presented in a readily reusable form, using common formats and open standards. http://www.bis.gov.uk/assets/biscore/innovation/docs/i/11-1387- innovation-and-research-strategy-for-growth.pdf Government pressure…
23 “We have opened up much public data already, but need to go much further in making this data accessible. We believe publicly funded research should be freely available. We have commissioned independent groups of academics and publishers to review the availability of published research, and to develop action plans for making this freely available” Making Public Data Accessible The Open Data Institute (ODI) will be the first of its kind, a pioneering centre of innovation, driven by the UK Government’s Open Data policy
24 Data for Impact Research Excellence Framework (REF) measures researcher contributions and their impact Has struggled in terms of its breadth when it comes to extending beyond paper-based metrics Wariness of researchers to spend time on activity that doesn’t count to the REF REF panels now allow submission of “a substantial, coherent and widely admired data set or research resource”
25 Data Citation Data access raises visibility Data with DOI = citeable research output Data citations are good for researchers
Rule 1. Don’t Share It All But! You generally need a reason NOT to share, e.g. -Commercial interests -Ethical concerns -Data Protection Act
Various factors at play… Law(s) of the land(s) (FOI, DPA) Government pressure Funder policies (and expectations) Publisher policies Institutional policies Disciplinary norms Ethical considerations Commercial interests / partnerships
Why not? 1. We probably can’t afford the costs of storage: increasing volumes outpace declining storage hardware costs and 2. We probably can’t afford the time it will take to ensure it remains accessible/discoverable Rule 2. Don’t Keep It All According to: John Gantz and David Reinsel 2011 Extracting Value from Chaos, http://www.emc.com/digital_universehttp://www.emc.com/digital_universe
http://blog.dshr.org/2012/05/lets-just-keep-everything-forever-in.html “Keeping 2018’s data in S3 would cost the entire global GDP”
How to decide? 1.Relevance to Mission – including any legal/funder requirement to retain the data beyond its immediate use. 2.Scientific or Historical Value – significance and relationship to publications etc. 3.Uniqueness – can it be found elsewhere / if we don’t preserve it, who will? 4.Potential for Redistribution – quality / IP / ethical concerns are addressed. 5.Non-Replicability – either impossible to replicate (e.g. atmospheric or social science data) or not financially viable. 6.Economic Case – costs of managing and preserving the resource stack up well against potential future benefits. 7.Full Documentation – surrounding / contextual information necessary to facilitate future discovery, access, and reuse is adequate. How to Appraise & Select Research Data for Curation Angus Whyte, Digital Curation Centre, and Andrew Wilson, Australian National Data Service (2010)
All Together: Institutional Engagements With funding from HEFCE we’re: Working intensively with c. 20 HEIs to increase RDM capability – 60 days of effort per HEI drawn from a mix of DCC staff – Deploy DCC and external tools, approaches and best practice Support varies based on what each institution wants/needs – Institution agrees a schedule of work with the DCC, and each assigns a primary contact / programme manager Lessons and examples to be shared with the community www.dcc.ac.uk/community/institutional-engagements
Data Management Planning: roles and responsibilities for data across the research lifecycle Group Exercise Martin Donnelly and Jonathan Rans Digital Curation Centre University of Edinburgh University of Stirling 25 March 2013
§1: Introduction and Context §2: Data Types, Formats, Standards and Capture Methods §3: Ethics and Intellectual Property §4: Access, Data Sharing and Re-use §5: Short-Term Storage and Data Management §6: Deposit and Long-Term Preservation §7: Resourcing §8: Adherence and Review §9: Agreement/Ratification by Stakeholders §10: Annexes DMP Checklist Headings Checklist for a Data Management Plan (Donnelly and Jones)
Group exercise (20 minutes) In groups of 4 or 5: Select one of the DMP Checklist headings, and brainstorm all the stakeholders you think might be involved (and how/why) – be specific! Remember to think of different stages of research: pre-award, in-project, post-project We’ll have a short reporting/discussion session at the end SECTIONS §1: Introduction and Context §2: Data Types, Formats, Standards and Capture Methods §3: Ethics and Intellectual Property §4: Access, Data Sharing and Re- use §5: Short-Term Storage and Data Management §6: Deposit and Long-Term Preservation §7: Resourcing §8: Adherence and Review §9: Agreement/Ratification by Stakeholders §10: Annexes
N.B. There are no ‘right’ or ‘wrong’ answers All research projects are different The DMP will depend upon the nature of the research AND the context (funder, domain, institution(s) etc) DMPs are metadata and communication tools Notes
QUESTIONS AND CONTACTS For more information: – Visit http://www.dcc.ac.ukhttp://www.dcc.ac.uk – Email firstname.lastname@example.org@ed.ac.uk – Twitter @mkdDCC This work is licensed under a Creative Commons Attribution 2.5 UK: Scotland License.
CREDITS Images: Slide 3 (Definitions) – http://www.flickr.com/photos/dougbelshaw/http://www.flickr.com/photos/dougbelshaw/ Slide 11 (Feet up) – http://www.flickr.com/photos/chaparral/http://www.flickr.com/photos/chaparral/ Slide 14 (Driver) – http://www.flickr.com/photos/rpmarks/http://www.flickr.com/photos/rpmarks/ Slide 26 (Equations) – http://www.flickr.com/photos/billburris/http://www.flickr.com/photos/billburris/ Slide 28 (Greenhouse) – http://www.flickr.com/photos/mykl/http://www.flickr.com/photos/mykl/ Thanks also to DCC colleagues for their slides: Kevin Ashley, Liz Lyon, Graham Pryor, Sarah Jones, Marieke Guy