Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives.

Similar presentations


Presentation on theme: "Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives."— Presentation transcript:

1 Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives

2 Introductions  Please tell us your name and department  A brief description of your primary research area  What do you consider to be your research data?  Optional:  Experience managing research data?  Experience writing a data management plan? cc http://www.flickr.com/photos/quinnanya/

3 Introductions Background Definitions Upfront Decisions Data Sharing Impacts Fundamentals Practices File Organization Data Documentation Reliable Backup Data Lifecycle Strategy Agenda

4 Why are we here?

5 But why are we really here?  An Impetus: NSF recently released a mandate that all grant applications submitted after January 18 th, 2011 must include a supplemental “Data Management Plan”  An Effect: This mandate from NSF has had a domino effect, and many funders that now require or state guidelines for data management of grant funded research  A Challenge: Data management (and oftentimes research methods in general) is an area that has not traditionally received a full treatment in most graduate and doctoral curricula

6 What is meant by “data management”? Fundamental Practices  File Organization  Data Documentation  Reliable Backups Data lifecycle  Digital Sustainability  Scholarly Communication  Data Publishing  Research Impact

7  Effective January 18, 2011  NSF will not evaluate any proposal missing a DMP  May be up to two pages long  PI may state that project will not generate data or samples  DMP is reviewed as part of intellectual merit or broader impacts of application, or both  Costs to implement DMP may be included in proposal’s budget

8 NSF’s Data Management Guidelines  Policies for re-use, re-distribution, and creation of derivatives  Plans for archiving data, samples, and other research outcomes, maintaining access  Types of data, samples, physical collections, software generated  Standards for data and metadata format and content  Access and sharing policies, with stipulations for privacy, confidentiality, security, intellectual property, or other rights or requirements

9 Other Federal Policies NASA “promotes the full and open sharing of all data” “requires that data…be submitted to and archived by designated national data centers.” “expects the timely release and sharing of final research data" "IMLS encourages sharing of research data." “…should describe how the project team will manage and disseminate data generated by the project”

10 Upfront Decisions for Researchers  What is the expected lifespan of the data?  Besides the researcher(s) on the project, who else should be given access to the data?  Does the dataset include any sensitive information?  Who owns or controls the research data?  Should any restrictions be placed on the dataset?  How are the data stored and preserved?

11 Upfront Decisions for Researchers  How might the data be used, reused, and repurposed?  How is the data described and organized?  Who are the expected and potential audiences for the datasets?  What publications or discoveries have resulted from the datasets?  How should the data be made accessible?

12 Data Sharing Impacts  Reinforces open scientific inquiry  Encourages diversity of analysis and opinion  Promotes new research, testing of new or alternative hypotheses and methods of analysis  Supports studies on data collection methods and measurement Cc http://www.flickr.com/photos/pinchof_10/

13 Data Sharing Impacts (cont.)  Facilitates education of new researchers  Enables exploration of topics not envisioned by initial investigators  Permits creation of new datasets by combining data from multiple sources

14 Introductions Background Definitions Upfront Decisions Data Sharing Impacts Fundamentals Practices File Organization Data Documentation Reliable Backup Data Lifecycle Strategy Agenda

15 File Organization Practices: Overview 1.Create a file plan for your research project 2.Design a file naming convention that works for your project 3.Agree on a version control method to assist with file synchronization 4.Carefully choose file formats to maximize usefulness “When I was a freshmen I named my assignments Paper Paperr Paperrr Paperrrr” -Undergrad

16 1. Create a file plan for your research project  File plan as a classification system  Indexed – makes it easier to locate folders/files  Primary subjects – main functions of research project  Secondary subjects – more specific activities of project, including research data Tertiary subjects – limit by date or equivalent – File Name (naming conventions)

17 1. Create a file plan for your research project (cont.) Example documentation of Directory Hierarchy:  /[Project]/[Grant Number]/[Event]/[Date] Example documentation of File Naming Convention:  [investigator]_[method]_[descriptor]_[YYYYMMDD]_[version].[ext]

18 2. Design a file naming convention that works for your project  Why file naming conventions?  Enable better access/retrieval of files  Create logical sequences for file sorting  More easily identify what you’re searching for

19  Meaningful but short (255 character limit)  Descriptive while still making sense  Capital letters or underscores differentiate between words  Surname first followed by initials of first name  More on handout 2. Design a file naming convention that works for your project (cont.)

20 ThisNot This sharpeW_krillMicrograph_backscatter3_20110117.tifKrillData2011.tif ThisNot This borgesJ_collocation_20080414.xmlBorges_Textbase.xml

21 3. Agree on a version control method to assist with file synchronization  Version number of record indicated file name with “v” followed by version number  Letter “d” indicates draft Examples of simple version control: waltM_lakeLansing_fieldNotes_20091012_v002.doc petersK_OrgChart2009_d001.svg

22 4. Carefully choose file formats to maximize usefulness Non-proprietary Open, documented standard Common usage by research community Standard representation (ASCII, Unicode) Unencrypted Uncompressed

23 Documentation Practices: Overview 1.At minimum create a README file that you can use to document your project 2.Utilize standards for describing data including Metadata Standards 3.If applicable, use in-line code commentary to explain code (cc) Will Scullin

24 1. At minimum create a README file that you can use to document your project  At minimum, store documentation in readme.txt file or equivalent, with data  Resource: http://libraries.mit.edu/guides/subjects/data- management/metadata.htmlhttp://libraries.mit.edu/guides/subjects/data- management/metadata.html

25  “Data about data”  Standardized way of describing data  Explains who, what, where, when of data creation and methods of use  Provides the essential tools for discovery, such as a bibliographic citation 2. Utilize standards for describing data including Metadata Standards

26 Basic project metadata: Title Language File Formats Creator Dates File Structure Identifier Location Variable List Subject Methodology Code Lists Funders Data Processing Versions Rights Sources Checksums Access Information List of File Names

27 Documentation Practices: Example Metadata Standards  Dublin Core Easy-to-create-and-maintain descriptive format to facilitate cross-domain resource discovery on the Web Dublin Core  Darwin Core Facilitates reference and sharing of biological diversity datasets Darwin Core  Data Documentation Initiative (DDI) Methodology for content, presentation, transport, and preservation of metadata about datasets in the social and behavioral sciences Data Documentation Initiative (DDI)

28 Documentation Practices: Example Metadata Standards  Directory Interchange Format Descriptive format for exchanging information about earth science data Directory Interchange Format  ISO 19115:2003 Describes geographic data such as maps and charts ISO 19115:2003  PBCore Supports description and exchange of media assets, including both individual clips and full, edited, aired productions PBCore

29 Documentation Practices: Example Metadata Standards  Science Data Literacy Project Metadata for astronomy, biology, ecology and oceanography Science Data Literacy Project  VRACore Data standard for description of works of visual culture as well as images that document them VRACore

30 3. If applicable, use in-line code commentary to explain code Example of R code commentary # Cumulative normal density pnorm(c(-1.96,0,1.96))

31 Backup Practices: Overview 1.Avoid single points of failure 2.Understand the different types of storage 3.Ensure data redundancy 4.Aim for geographic distribution of data

32 1. Avoid single points of failure A single point of failure occurs when it would only take one event to destroy all data on a device (e.g. dropped hard drive) Good practices for avoiding single points of error:  Use managed networked storage whenever possible  Move data off of portable media  Never rely on one copy of data  Do not rely on CD or DVD copies to be readable  Be wary of software lifespans (e.g. Angel)

33 2. Understand the different types of storage Flash Drives Internal Hard Drives External Hard Drives Server and Web Storage Managed Networked Storage Cloud Storage

34 3. Ensure data redundancy Backup Do’s:  Make 3 copies  E.g. original + external/local + external/remote  E.g. original + 2 formats on 2 drives in 2 locations  Geographically distribute and secure  Local vs. remote, depending on needed recovery time  Personal computer, external hard drives, departmental, or university servers may be used

35 3. Ensure data redundancy (cont.) Backup Don’ts:  Do not rely on one copy  Do not use CDs and DVDs  Do not rely on ANGEL (cc) George Ornbo

36 3. Ensure data redundancy (cont.) Backup Maybe:  Cloud storage  Amazon s3  Google  MS Azure  DuraCloud  Rackspace Note that many enterprise cloud storage services include a charge for in/out of data transfers $$$

37 Introductions Background Definitions Upfront Decisions Data Sharing Impacts Fundamentals Practices File Organization Data Documentation Reliable Backup Data Lifecycle Strategy Agenda

38 Research is… Define a question Gather information Form a hypothesis Test the hypothesis Analyze the data Interpret the data Publish results Retest

39 Define a question Gather information Form a hypothesis Test the hypothesis Analyze the data Interpret the data Publish results Retest ?

40 Define a question Gather information Form a hypothesis Test the hypothesis Analyze the data Interpret the data Publish results Retest The scientific method “is often misrepresented as a fixed sequence of steps,” rather than being seen for what it truly is, “a highly variable and creative process” (AAAS 2000:18). Gauch, Hugh G. Scientific Method in Practice. New York: Cambridge University Press, 2010. Print. (Emphasis added)

41 Define a question Gather information Form a hypothesis Test the hypothesis Analyze the data Interpret the data Publish results Retest

42 The Research Depth Chart Scientific Method Research Design Research Method Research Tasks More Specific More Generic

43 Define a question Gather information Form a hypothesis Test the hypothesis Analyze the data Interpret the data Publish results Retest

44 Source: DDI Structural Reform Group. “Overview of the DDI Version 3.0 Conceptual Model.“ DDI Alliance. 2004. http://opendatafoundation.org/ddi/srg/Papers/DDIModel_v_4.pdf The Data Management Depth Chart Research Data Lifecycle Model

45 The Data Management Depth Chart Research Data Lifecycle Model Research Data Management Tasks ???

46 The Data Management Depth Chart Research Data Lifecycle Model ??? Data Management Plan Research Data Management Tasks

47  http://www.lib.msu.edu/about/diginfo/ldmp.jsp http://www.lib.msu.edu/about/diginfo/ldmp.jsp

48 Data are brainstormed Study Concept

49 Data are brainstormed Data type, purpose & value University Research Council guidelines Research Facilitation and Dissemination Lifecycle Data Management Planning Research Data Management Guidance Start your Data Management Plan!

50 Data are collected and secured Study Concept Data Collection

51 Data are collected Data format, size & short term storage ATS Andrew File System (AFS) Institute for Cyber Enabled Research MSU Libraries Data Services MSU Libraries Campus Data Resources File Plan, File Naming, Backup Plan

52 Data are normalized and processed Study Concept Data Collection Data Processing

53 Data are processed Data transformations & structures LCT Computing Courses High Performance Computing Center Consortium of Research Consulting Services Documentation, Methodology

54 Data are distributed Data Distribution Study Concept Data Collection Data Processing

55 Data are distributed Data sharing, security & rights Human Research Protection Program University Research Council guidelines MSU Libraries Copyright Permissions Center MSU Google Apps Roles, Responsibilities, Resources

56 Data are discoverable Data Distribution Study Concept Data Collection Data Processing Data Discovery

57 Data are discoverable Data publishing & metadata Development of Copyrighted Materials MSU Libraries Data Citation Guide README, Metadata Standard

58 Data are analyzed Data Distributio n Data Discovery Data Analysis Study Concept Data Collection Data Processing

59 Data are analyzed Standards & workflow documentation Center for Statistical Training and Consulting Statistical Consulting Services Code Commentary, Documentation

60 Data are stored and preserved Data Distribution Data Discovery Data Analysis Study Concept Data Collection Data Processing Data Archiving

61 Data are preserved Long term storage & management VPRGS Repositories and Archives Lifecycle Data Management Planning Databib.org! Embrace stewardship

62 Data can be used and reused Data Distribution Data Discovery Data Analysis Study Concept Data Collection Data Processing Data Archiving Repurposing

63 Data can be used and reused Broader impact Research Data Management CAFE MSU Research Centers and Institutes MSU Libraries Data Citation Guide Publish your data!

64 Research Data Management Guidance  Face-to-face Advising  Writing Data Management Plans  Planning for Digital Projects  Managing Digital Information  Group Training  New Faculty Orientation  Faculty Seminars  Classroom Instruction lib.msu.edu/about/rdmg

65 In Conclusion…  Upfront Decisions Researchers Need to Make  General Good Practices for Managing Research Data  NSF, NIH, IMLS and Other Funders’ Requirements  Lifecycle of Research Data

66 Contact Lisa M. Schmidt Electronic Records Archivist University Archives & Historical Collections lschmidt@ais.msu.edu lschmidt@ais.msu.edu Aaron Collie Digital Curation Librarian MSU Libraries collie@msu.edu collie@msu.edu


Download ppt "Data Management for Research Aaron Collie, MSU Libraries Lisa Schmidt, University Archives."

Similar presentations


Ads by Google