Presentation is loading. Please wait.

Presentation is loading. Please wait.

The research data lifecycle

Similar presentations


Presentation on theme: "The research data lifecycle"— Presentation transcript:

1 The research data lifecycle
Research Data Management Workshop 1.4 The research data lifecycle May-19 Learning material produced by RDMRose

2 Session 1.4 overview DCC Curation Lifecycle Model
Background Target audience The 8 lifecycle actions Alternative lifecycle models May-19 Learning material produced by RDMRose

3 DCC Curation Lifecycle Model
The DCC Curation Lifecycle Model is an authoritative generic model outlining what the umbrella term RDM consists of It outlines the activities that are required to successfully curate research data throughout its entire lifecycle The model is an idealised situation: curation is planned from the very beginning, and planned for throughout the lifecycle According to the DCC (n.d. c) this model is relevant to: Data creators Data archivists/curators Data (re)users May-19 Learning material produced by RDMRose

4 Background The DCC Curation Lifecycle Model is based on the OAIS Reference Model OAIS = Open Archival Information System (pictured) OAIS is a model that defines a generic framework for building a digital archive (Lavoie, 2004 and Ball, 2006) May-19 Learning material produced by RDMRose

5 Background The DCC Curation Lifecycle Model adds to the OAIS Reference Model It includes activities that take place outside the archival system: the research lifecycle In particular: the creation of data, the re-use of data by other research projects May-19 Learning material produced by RDMRose

6 DCC Curation Lifecycle Model
May-19 Learning material produced by RDMRose

7 Actions Three sets of actions:
Sequential Actions (8): key actions needed as data move through their lifecycle Occasional Actions (3): only occur when special conditions are met, but they do not apply to all data Full Lifecycle Actions (4): apply to all stages in the lifecycle May-19 Learning material produced by RDMRose

8 Actions Sequential actions (8): Conceptualise Create or receive
Appraise and select Ingest Preservation action Store Access, use and reuse Transform May-19 Learning material produced by RDMRose

9 Action 1 Conceptualise Aim:
Designing research projects (and grant proposals) with digital curation in mind, so that you produce curation-ready data Rusbridge (2008): “Repeat after me: curation begins before creation!” May-19 Learning material produced by RDMRose

10 Key activities DCC’s key activities include planning for:
Data capture and storage in curation-friendly file formats (open standards) Recording sufficient information at the time of data capture to assist with ongoing management of those data and with their use Data storage on appropriate media Identification of a safe place for the data and ensuring that an archive will take them May-19 Learning material produced by RDMRose

11 Action 2 Create or receive
Aim: Creating or receiving digital data that is curation-ready DCC’s key activities: Researcher: Create data that is curation ready, including metadata. LIS professional: Receive data, in accordance with documented collecting policies, from data creators, other archives, repositories or data centres, and if required assign appropriate metadata. May-19 Learning material produced by RDMRose

12 Data quality Authentic: be what it purports to be.
Reliable: have trusted contents which accurately reflects the business transaction documented. Have integrity: be complete and unaltered. Usable: can be located, retrieved, presented and interpreted. (ISO ) Authentic: if the data was created at the time, by the person that the data says it was May-19 Learning material produced by RDMRose

13 Metadata Descriptive: ensures identification, location and retrieval.
Technical: records the technical infrastructure used to create or access the data. Administrative: for management of data such as acquisition, appraisal decisions, and IPR. Use: manages access rights and tracks usage. Preservation: records preservation actions, such as checksums, and migrations (Based on Higgins, 2012, p. 38.) Representation information: metadata that are necessary to make the dataset intelligible to the designated community May-19 Learning material produced by RDMRose

14 Action 3 Appraise and select
“the process of evaluating material in order to decide which to retain over the long term, which to retain for the meantime, and which to discard.” (Higgins, 2012, p. 28) DCC’s key activities: Evaluate data and select for long-term curation and preservation Adhere to documented guidance, policies or legal requirements May-19 Learning material produced by RDMRose

15 Why appraise and select?
Digital content expands (data deluge). Backup and mirroring increases costs. Discovery gets harder. Managing and preserving is expensive. (Based on Whyte and Wilson, 2010.) May-19 Learning material produced by RDMRose

16 Significance Appraisal = “determination of significance” (Harvey, 2010, p. 132) What data do you need/want to keep? Which datasets or digital resources do you want to keep? Which characteristics or elements of these datasets or resources do you want to keep? (Look and feel, structure of dataset, functionality such as hyperlinks or embedded comments, interoperability with other datasets.) How long do you need/want to keep the data? E.g. in terms of user requirements (as evidence for verifying conclusions) or risks of not keeping the data. OAIS has defined : Designated community Representation information Actors: Producer -> (archive) -> consumer Designated community: the set of consumers who should be able to understand the preserved information. So you need to have an understanding of their knowledge base: who might the future users be and what knowledge and understanding would they have – this defines the kind of data and metadata that needs preserving A form of metadata that is necessary to make the dataset intelligible to the designated community. The information in a book is typically expressed by characters (the data) which, when combined with a knowledge of the language used (the Knowledge Base), are converted to more meaningful information. If the recipient does not know the language, then the book needs to be accompanied by dictionary and grammar (i.e., Representation Information) in a form that is understandable using the recipient’s Knowledge Base (Meghini 2008) May-19 Learning material produced by RDMRose

17 Criteria General appraisal criteria (Whyte & Wilson, 2010)
Relevance to mission of the repository Relevance to research: scientific or historical value (inferring anticipated future use) Uniqueness (the only or most complete source? At risk of loss if not accepted?) Non-replicability (not feasible or impossible) Potential for redistribution (depending on reliability, integrity and usability of the data; legal issues may limit this) Full documentation (to facilitate discovery, access, reuse etc.) Economic case (costs of long-term maintenance vs potential future benefits, available funding) May-19 Learning material produced by RDMRose

18 Action 4 Ingest DCC’s key activities:
Transfer data to an archive, repository, data centre or other custodian Adhere to documented guidance, policies or legal requirements The term “Ingest” was introduced by the Open Archival Information System (OAIS) Reference Model May-19 Learning material produced by RDMRose

19 Key activities for ingest
Preparing the data for placing in long-term storage could involve: Assigning a persistent identifier (such as a DOI) Checking that the data does not contain malware Extracting, creating and assigning description and representation information Creating fixity values (checksums) Confirming technical details such as file formats Combining the data and their associated metadata into an Archival Information Package Migrating data to a different file format (DCC, n.d. a) May-19 Learning material produced by RDMRose

20 Action 5 Preservation Action
Aims: To ensure that data remains authentic, reliable and usable while maintaining its integrity (data quality). DCC’s key activities: Undertaking actions to ensure long-term preservation and retention of the authoritative nature of data. May-19 Learning material produced by RDMRose

21 Preservation actions and strategies
Ongoing preservation actions (Lord and Macdonald, 2003, 30-31): Data checking and cleaning (detecting and correcting/removing corrupt or inaccurate data) Assigning preservation metadata and representation information Ensuring acceptable data structures or file formats (open standards) Apply good data management practices Implement secure storage and organisational continuity Three main families of digital preservation strategies to combat obsolescence of hardware and software: Information migration Technology emulation Technology preservation (‘computer museums’) May-19 Learning material produced by RDMRose

22 Action 6 Store DCC’s key activities: This includes
Storing the data in a secure manner adhering to relevant standards This includes the storage facilities themselves, including refreshment of storage media to avoid hardware obsolescence or bit-rot the administration of the data storage service with appropriate policies May-19 Learning material produced by RDMRose

23 Specific activities (Harvey, 2010)
Develop, maintain, and apply policies relating to secure data storage Ensure that sufficient description and representation information is stored with data Use a reliable storage medium, preferably on more than one carrier and with geographically distributed backup systems Monitor events that might trigger other preservation actions (e.g., file format migration, file corruption) Regularly check to ensure the integrity of the stored data and their description and representation information Ensure system and physical security Maintain and replace the technical infrastructure as necessary Develop, and administer as necessary, data recovery procedures May-19 Learning material produced by RDMRose

24 Action 7 Access, use and reuse
Aims: Data can be located, and used and reused by legitimate users DCC’s key activities: Ensuring that data is accessible to both designated users and re-users, on a day-to-day basis, usually (but not necessarily) in the form of publicly available published information Applying robust access controls and authentication procedures where applicable May-19 Learning material produced by RDMRose

25 Specific actions Ensuring data is able to be discovered (located) by applying standards that ensure appropriate metadata are present so data can be located Ensuring that the required legal permissions are available for data to be used and reused, and that legal restrictions on the use and reuse of data are adhered to (funding bodies, legislation about confidentiality and privacy, IPR) Providing tools that allow collaboration in the use and reuse of data (e.g. annotation) Ensuring data is accessible only by authorised users, by applying access controls and authentication procedures. May-19 Learning material produced by RDMRose

26 Action 8 Transform DCC’s key activities: Methods:
Create new data from the original data Methods: Creating a subset (by selection or query) to create newly derived results for verification of results as the basis of further research Migration into a different format (migration changes data) May-19 Learning material produced by RDMRose

27 Activity 1: How usable is the model?
How is the model different from a library’s typical emphasis on collection development for access (as opposed to preservation)? If you were a researcher, how useful would this model be? How useful is the DCC Lifecycle Model for you in your role? May-19 Learning material produced by RDMRose

28 Other lifecycles DCC lifecycle emphasises data curation not research
Creating data Processing data Analysing data Preserving data Giving access to data Re-using data DCC lifecycle emphasises data curation not research Other lifecycles for example more fully incorporate the research lifecycle E.g. the UK Data Archive’s research data lifecycle (on the left): May-19 Learning material produced by RDMRose

29 Activity 2: Alternative lifecycle models
Look at the Review of Data Management Lifecycle Models by A. Ball at The document gives an overview of 8 alternative models, including the DCC Curation Lifecycle Model Examine the models. Which of these, if any, would you prefer to use when discussing RDM with a researcher, and why? Which of these would be most useful for you in your role, and why? May-19 Learning material produced by RDMRose

30 Images, Sources and References
May-19 Learning material produced by RDMRose

31 Images Slide 4: DCC Curation Lifecycle Model:
DCC Curation Lifecycle Model: May-19 Learning material produced by RDMRose

32 Sources Slides on the DCC Curation Lifecycle Model are based on:
DCC (n.d. a). Digital 101 materials. Edinburgh: Digital Curation Centre. Retrieved from DCC (n.d. b) DCC Charter and Statement of Principles. Edinburgh: Digital Curation Centre. Retrieved from DCC (n.d. c) Lifecycle Model FAQ. Edinburgh: Digital Curation Centre. Rretrieved from May-19 Learning material produced by RDMRose

33 References Ball, A. (2006). Briefing Paper: the OAIS Reference Model. Retrieved from Ball, A. (2012). Review of Data Management Lifecycle Models. Bath: University of Bath. Retrieved from Donnelly, M. (2012). Data management plans and planning. In G. Pryor (Ed.). Managing Research Data (pp ). London: Facet. Harvey, R. (2010) Digital Curation: A How-To-Do-It Manual. London: Facet. Higgins, S. (2012) The lifecycle of data management. In G. Pryor (Ed.). Managing Research Data (pp ). London: Facet. May-19 Learning material produced by RDMRose

34 References Lavoie, B.F. (2004) The Open Archival Information System Reference Model: introductory guide. Dublin, Ohio; York: OCLC Online Computer Library Centre; Digital Preservation Coalition. Retrieved from Lord, P. & Macdonald, A. (2003). Data Curation for e-Science in the UK: An Audit to Establish Requirements for Future Curation and Provision. Twickenham: The Digital Archiving Consultancy. Michener, W.K., Brunt, J.W., Helly, J.J., Kirchner, T.B., & Stafford, S.G. (1997). Nongeospatial metadata for the ecological sciences. Ecological Applications, 7(1), Rusbridge, C. (2008). Project data life course. Blogs. Edinburgh: Digital Curation Centre, Whyte, A., & Wilson, A. (2010). How to Appraise & Select Research Data for Curation. Edinburgh: Digital Curation Centre, May-19 Learning material produced by RDMRose


Download ppt "The research data lifecycle"

Similar presentations


Ads by Google