Presentation on theme: "Data Management Best Practices"— Presentation transcript:
1 Data Management Best Practices According to DataONE, the Data Observation Network for Earth, the goal of data management is to produce self-describing data sets. This is a good goal whether the data be observational, experimental, derived, or simulation based.“The goal of data management is to produce self-describing data sets.” DataONE Primer on Data Management. (Strasser)
2 Data Sharing and Management Snafu in 3 Short Acts: A data management horror story by Karen Hanson, Alisa Surkis and Karen Yacobucci. Don’t be the brown bear.
3 Why manage research data? You can find and understand your data when you need to use itThere is continuity if project staff leave or new researchers joinYou can avoid unnecessary duplication e.g. re-collecting or re-working dataData underlying publications are maintained, allowing for validation of resultsData sharing leads to more collaboration and advances researchResearch is more visible and has greater impact.Other researchers can cite your data so you gain credit (Jones)In addition to helping you create self describing data sets, there are some other benefits of creating data management plans.
4 Government Requirements “To the extent feasible and consistent…digitally formatted scientific data resulting from unclassified research supported wholly or in part by Federal funding should be stored and publicly accessible to search, retrieve, and analyze.” OSTP February 22, 2013 memo: Increasing Access to the Results of Federally Funded Scientific ResearchDOE Public Access Plan: “To the greatest extent… data sharing should make digital data available to and useful for the scientific community…” Began requiring data management plans for solicitations received after October 1,Many funders including NIH and NSF have data management plan requirements, data sharing requirements, or both. In its memo from February of last year: “Increasing Access to the Results of Federally Funded Scientific Research,” the White House Office of Science and Technology Policy directed all federal agencies with over $100 million of R&D funding to ensure that:The Department of Energy’s Public Access Plan is the first response we have seen from funders.
5 Journal RequirementsData Dryad is a general purpose data repository. It started as infrastructure to support the Joint Data Archiving Policy for journals in the field of evolution.PLOS Data Sharing Policy (updated March 2014): “PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception.”
6 Data Management Best Practices Data Management PlanFile ManagementDocumentationStorage and backupLong term planningNow that I’ve hopefully instilled a sense of importance and urgency in you about the need to manage your data, let’s get in to what you can do to manage your data better. Today I am going to give you some tips and tricks related these different areas of data management.
7 Data Management Plans (DMPs) DMPs for grant applications are a “light touch”Should be considered living documentsCan act as standard operating procedureCan help ensure documentation is completeCan save time while writing up results for publicationData management plans should be considered living documents. The data management plan you create for your grant application is a sort of light touch plan to get you started in planning for managing your data. As you move in to the project your data management plan should become more in depth and act as a standard operating procedure during the project and can later be used to help ensure documentation is complete and will save time while writing up the results for publication.
8 Data Management Plans (DMPs) What types of data will be created? Who will own, have access to, and be responsible for managing these data? What equipment and methods will be used to capture and process data? Where will data be stored during and after?Slide Credit: Module 1: Overview of Research Data ManagementNew England Collaborative Data Management Curriculum
9 Data management plan help DMPTool –Specific guidance for mostly U.S. funders, customized for Princeton.DMPOnline –From the U.K. https://dmponline.dcc.ac.uk/MIT Data Planning Checklist -DCC Checklist for a Data Management Plan - From the U.K.Jones, S. (2011). ‘How to Develop a Data Management and Sharing Plan’. DCC How-to Guides. Edinburgh: Digital Curation Centre. Available online:DMP review and consultation services available from Princeton Library’s RDMTeam,
10 File and folder management “Figure 4: A snapshot of data management practices. File names given by students are shown for a sampling of .1sc files, illustrating the variety of naming conventions used.” (Ferguson)File and folder management is an aspect of how you capture and process your data. If you are interested in learning more about this, Carla Zimowsk gives a great presentation devoted solely to file and folder management, but I wanted to take a second to talk about it here.Ferguson, Jen. “Lurking in the Lab: Analysis of Data from Molecular Biology Laboratory Instruments.” Journal of eScience Librarianship 1, no. 3 (March 13, 2013).
11 File Naming Best Practices Files should be named consistentlyFile names should be descriptive but short (<25 characters) (Briney)Avoid special characters in a file name.Use capitals or underscores instead of periods or spaces.Use date format ISO 8601:YYYYMMDDInclude a version number (Creamer et al.)Write down naming convention in data management planThese are some best practices for creating file names. Poorly constructed file names can cause issues when transferring files from one format to another, or to another operating system.
12 File Naming Conventions How?Pick what is most important for your nameDateSiteAnalysisSampleShort descriptionSlide Credit: Briney
13 Example YYYYMMDD_site_sampleNum 20140422_PikeLake_03 _EastLake_12Analysis-sample-concentrationUVVis-stilbene-10mMIR-benzene-pureSlide credit: Briney
14 File Organization How? Any system is better than none Make your system logical for your dataPossibilitiesBy projectBy analysis typeBy date…In addition to creating a filing naming convention you can also greatly enhance the overall ease of use of your data by organizing your files into a logical structure.Slide Credit: Briney
16 Documentation The Who, What, When, Where and Why of Your Data Why? Data without notes are unusableBecause you won’t remember everythingFor others who may need to use your files (Briney)
17 Documentation How? Methods Protocols Code Survey Codebook Data dictionaryAnything that lets someone reproduce your resultsSlide Credit: Briney
18 Documentation How? Take good notes Metadata schemas Templates TemplatesLike structured metadata but easierDecide on a list of information before you collect dataMake sure you record all necessary detailsTakes a few minutes upfront, easy to use laterPut in data management planPrint and post in prominent place or use as worksheetSlide Credit: Briney
19 Best Practices Describe the contents of data files Define the parameters and the units on the parameterExplain the formats for dates, time, geographic coordinates, and other parametersDefine any coded valuesDescribe quality flags or qualifying valuesDefine missing valuesHere are some best practices for using metadata to help someone make sense of your data.Slide Credit: Module 1: Overview of Research Data ManagementNew England Collaborative Data Management Curriculum
20 Best Practices Title Methodology Creator Data processing Identifier SourcesSubjectList of file namesFundersFile FormatsRightsFile structureAccess informationVariable listLanguageCode listsDatesVersionsLocationChecksumsHere is a list of common metadata fields associated with a data set.Slide Credit: Module 1: Overview of Research Data ManagementNew England Collaborative Data Management Curriculum
21 Documentation Where? README.txt For digital information, address the questions“What the heck am I looking at?”“Where do I find X?”Use for project description in main folderUse to document conventionsUse where ever you need extra claritySlide Credit: Briney
22 Storage and Backup Good storage practices prevent loss Make 3 copies (original + external/local + external/remote) (MIT Libraries)Where?Personal computer hard drives. Backup available for faculty, staff, and graduate students.External hard drives (Available at OIT Tech Depot)Central File Server (H: Drive) – 5 GB, Departmental Storage (M: Drive)Cloud Storage: All undergrads have a 30 GB Google Drive account. Faculty, Staff, and Graduate students can request.
23 Backups… what and when“What will you need to restore in the event of data loss?” In general only backing up data is sufficient. (UK Data Archive)OIT Knowledge Base: What files should I back up?“Backups should be made after every change of data [and]/or at regular intervals.” (UK Data Archive)UK Data Archive help pages on storing, backing up, data security, transmitting and encrypting data, file sharing, and data disposal.
24 Long term planning - Preservation At the completion of a projectNot the same as storage during a projectWhat are the funder or journal requirements?How long does it need to be preserved?Who is responsible for the data at the end of the project?Does funder or journal specify a repository?
25 Long term planning – Repositories Increases discoverabilityProvide persistent unique identifiers and information to aid data citationDifferent options availableMany disciplinary repositories availableGeneral repositories: Dataverse, Figshare, ZenodoDataSpace at Princeton:
26 Future File Usability Why? You may want to use the data in 5 years Prep for data sharingMay be needed to verify journal article resultsPer U.S. Office of Management and Budget Circular A-110, must retain data at least 3 years post-projectBetter to retain for >6 yearsWhether or not you use a repository, you will need to spend some time making sure you will be able to use your files in the future.Slide Credit: Briney
27 Best PracticesIs the file format open (i.e. open source) or closed (i.e. proprietary)? Is a particular software package required to read and work with the data file? If so, the software package, version, and operating system platform should be cited in the metadata… Do multiple files comprise the data file structure? If so, that should be specified in the metadata… When choosing a file format, select a consistent format that can be read well into the future and is independent of changes in applications. Non-proprietary: Open, documented standard, Unencrypted, Uncompressed, ASCII formatted files will be readable into the future.Here are some considerations for making your files available for the long-term.Slide Credit: Module 1: Overview of Research Data ManagementNew England Collaborative Data Management Curriculum
28 Future File Usability How? Convert file formats Can you open digital files from 10 years ago?Use open, non-proprietary formats that are in wide use.docx .txt.xlsx .csv.jpg .tifSee National Archives FAQ for moreSave a copy in the old format, just in casePreserve software if no open file formatSlide Credit: Briney
29 Future File Usability How? Move to new media Hardware dies and becomes obsoleteFloppy disks!Expect average lifetime to be 3-5 yearsKeep up with technologySlide Credit: Briney
30 Other Resources“Ten Simple Rules for the Care and Feeding of Scientific Data.” PLoS Computational Biology.Primer on Data Management: What You Always Wanted to Know. DataONE.Data Management General Guidance, DMPTool.https://dmptool.org/dm_guidanceCreate & Manage Data, UK Data Archive.Guidelines for Responsible Data Management in Scientific Research. U.S. Department of Health and Human ServicesHow-to Guides, Digital Curation Centre
31 This work is a derivative of: AttributionThis work is a derivative of:Practical Data Management, ACRL DCIG Webinar. April 30, Kristen BrineyCC-BY (http://creativecommons.org/licenses/by/4.0/)AndNew England Collaborative Data Management Curriculum, Module 1: Overview of Research Data Management. Andrew Creamer et al.CC-BY-NC (http://creativecommons.org/licenses/by-nc/4.0/)Slides used from each presentation are noted at the bottom of the slide.
32 Works CitedFerguson, Jen. “Lurking in the Lab: Analysis of Data from Molecular Biology Laboratory Instruments.” Journal of eScience Librarianship 1, no. 3 (March 13, 2013). doi:http://dx.doi.org/ /jeslibJones, S. (2011). ‘How to Develop a Data Management and Sharing Plan’. DCC How-to Guides. Edinburgh: Digital Curation Centre. Available online:“Back-Ups & Security, Data Management.” MIT Libraries. Accessed October 15,Strasser, Carly et al. “Primer on Data Management: What You Always Wanted to Know.”
33 Research data management services website Contact me!Willow DresselPlasma Physics and E-Science LibrarianResearch data management services website