Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Management for Grant Funded Projects

Similar presentations


Presentation on theme: "Data Management for Grant Funded Projects"— Presentation transcript:

1 Data Management for Grant Funded Projects
May 10, 2011 Jennifer Eustis, Libraries Antje Harnisch, OSP Jila Kazerounian, UITS David Lowe, Libraries Carolyn Mills, Libraries

2 Table of Contents Motivation:  What’s behind the new NSF Data Management Plan (DMP) requirement? [DL] “Data” as record Ulterior concerns Your plan: How do I meet the new DMP requirement? Types of data [JK] Standards (Formats, metadata) [JK,JE] Project Storage [DL] Access policies [AH] Post-project plans [JK,DL] Review of UConn examples [AH+] Resources:  Where can I turn for advice in meeting the new DMP requirement? OSP [AH] UITS [JK] Libraries’ Scholarly Communication page+ [CM]

3 Data and the Scientific Record: Purpose
To communicate (findings, hypotheses, insights) To organize (nomenclature, terminology, disciplines) To build communities toward collaboration To document, manage, resolve controversies To establish precedence To be trustworthy To be reproducible To perturb assumptions and methods See Clifford Lynch (2009)

4 Your Data and the DMP Consider providing access to the data from your project that serves the above purposes effectively and efficiently Examples of data like this, not like that (UOregon) Not: preliminary analyses drafts of scientific papers plans for future research peer reviews communications with colleagues

5 Motivations for Data Concerns
Fragility of digital data Data best managed when like sets/types together New paradigm of data intensive discovery Esp. cross-disciplinary

6 Motivation: Data fragility from Gizmodo, via BusinessInsider (2010)

7 Motivation: Like Data Together
Traditional Libraries: Maps AV Oversize Archival Material eScience Repositories: ICPSR for Social Sciences GenBank for Genetic Sequencing

8 Motivation: New Possibilities
Visions of “what if”: Single discipline: Better earthquake/tsunami prediction possible with signs from current data in hand? Cross-discipline: Traffic engineers and communications disorders specialists research shared data to alleviate the “wrong way driver” problem “The future is interdisciplinary.” --Susan Herbst, March 22, 2011

9 Your DMP Types of data [JK] Standards (Formats, metadata) [JK,JE]
Project Storage [DL] Access policies [AH] Post-project plans [JK,DL] Review of UConn examples [AH+]

10 I. Types of Data The types of data cover the following points:
Who is the data for and who controls it? PI, Funding Agency, University, etc. Who is your audience for the data? How will they use the data? What kind of data? e.g. Numeric, Text, Modeling, Multimedia/Image, etc. JK

11 I. Types of Data Is the data generated from experiments, simulated from models, observed and captured at the time of some event or derived and compiled from databases, data mining, etc.? JK

12 I. Types of Data What is the growth rate of the data?
Are you gathering data by hand or using sophisticated instrumentation that is able to capture a lot of data at once? Will there be more data as time goes on? If so, you will need to plan for the growth. What amounts to enough storage this year may not be sufficient for next year.

13 II. The Standards for Format
Data could be one of the following formats: Text -- e.g. Word, PDF Numeric -- e.g. Excel, Access, MYSQL Multimedia/Image -- e.g. jpeg, tiff, dicom, mpeg, quicktime Models -- e.g. 3D Domain-specific -- e.g. FITS in astronomy, CIF in chemistry To get more detail on the types of data, refer to the report issued by the UK Data Archive: ( JK

14 II. The Standards for Format
Formats more likely to be accessible in the future are: Non-proprietary Open, documented standard Common usage by research community Standard representation (e.g., simple text) Unencrypted Uncompressed JK

15 II. The Standards for Format
Examples of preferred format choices: PDF/A, not Word CSV [comma separated values], not Excel MPEG-4, not Quicktime TIFF or JPEG2000, not GIF or JPG XML or RDF , not RDBMS JK

16 II. File Version Control
Strategies include: file-naming conventions standard file headers (inside the file) listing creation date, version number, status log files JK

17 II. File Version Control
version control software (e.g., SVN [subversion]) Always record every change to a file no matter how small. Discard obsolete versions if no longer needed after making backups. JK

18 II. File Naming Convention
Reserve the 3-letter file extension for application-specific codes, e.g., formats like WRL, MOV, TIF Identify the activity or project in the file name, e.g., use the unique project name or identifier Project_name_YYYYMMDD[hh][mm][ss][_extra].ext JK

19 II. File Naming Conventions
Many academic disciplines have specific recommendations, e.g., DOE’s Atmospheric Radiation Measurement (ARM) Program: GIS datasets from Massachusetts State JK

20 II. Metadata Metadata in simplest terms is “data about data”
structured information that describes, explains, locates or otherwise makes it easier to retrieve, use or manage an information resource. a metadata record consists of a set of attributes, or elements, necessary to describe the resource in question Metadata assures accessibility of the data Can be embedded into data or stored separately

21 II. Metadata 3 main types of metadata addressed in different places in various standards: Descriptive: describes the resource for identification and discovery Structural: how compound objects are put together Administrative: creation date, file type, rights management (who can access the data), and preservation (archiving and preserving)

22 II. Data & Metadata Standards
Examples of scientific metadata standards: Astronomy Visualization Metadata Standard Content Standard for Digital Geospatial Metadata Darwin Core Data Documentation Initiative Dublin Core Ecological Metadata Language

23 II. Dublin Core Dublin Core metadata standard is a simple yet effective element set for describing a wide range of networked resources. Extensibility Data dictionary Examples: Invasive plant database Connecticut History Online images

24 II. Dublin Core elements
Title Creator Subject Description Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights see NSDL metadata guidelines

25 II. Guidelines for Good Metadata
Format appropriate to the data being collected Interoperable – can be stored and transmitted if needed Standard controlled vocabulary to reflect content Includes statement on conditions and terms of use Supports long term management Consistency Accuracy

26 II. Metadata Creation Tools
Dublin Core tools Learning Object Metadata Editors Use this tool online: FGDC Metadata Tools A list of metadata creation tools and metadata processing software. Each tool makes use of the Federal Geographic Data Committee's (FGDC) Content Standards for Digital Geospatial Metadata and may support the Biological Data Profile. OAI-Specific Tools A list of links to the tools implemented by members of the Open Archives Initiative community.

27 II. Metadata Resources Consult your professional societies for preferred metadata resources and tools National Information Standards Organization (NISO) A link to NISO’s booklet, “What is metadata” with examples and resources.

28 III. Project Storage (see handout)

29

30 IV. NSF—Dissemination and Access
Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing. General adjustments and, where essential, exceptions to this sharing expectation may be specified by the funding NSF Program or Division/Office for a particular field or discipline to safeguard the rights of individuals and subjects, the validity of results, or the integrity of collections or to accommodate the legitimate interest of investigators. A grantee or investigator also may request a particular adjustment or exception from the cognizant NSF Program Officer.

31 IV. Access Policies and Provisions
Is the data shared with other researchers? And if so, how is it shared? Do you have the right to share the data if not produced by you? If could be shared, is it shared with everyone or a limited number of people? JK

32 Data Policies UConn-Storrs has no data policy—yet Ownership
Health Center policy stipulates: Generally, research data are owned by faculty, grad student who created the data Data generated in projects supported by grants or contracts containing such provisions shall be jointly owned by the Health Center and the PI. The Health Center shall have an irrevocable right to obtain such data from the PI at any time, even if that individual has left the institution. Custody of the data will continue to be the responsibility of the PI. Typical University Data Policy Consistent with federal policy and prevailing higher education practice, Research Data belong to the University Retention—at least 3 years Institution Case must retain research data in sufficient detail and for an adequate period of time to enable appropriate responses to questions about accuracy, authenticity, primacy and compliance with laws and regulations governing the conduct of the research. PI: Custodian, responsible for collection, retention, and management of research data

33 IV. Data Sharing Publication − dissemination through articles in scientific journals Investigator − scientist responds directly to data requests (mailing a CD-ROM containing data or posting data on a Web site) Data Hosting (local data center or offsite)−controlled, secure environment in which eligible researchers can perform analyses using data resources Data Archive−place where data can be acquired, manipulated, documented, and distributed Mixed Mode−more than one version of a dataset, each providing a different level of access

34 IV. Limitations to Sharing/Access
What are the issues related to confidentiality and intellectual property? Does the data have direct or indirect information that could identify the research subjects? Is all or part of the data copyrightable? (Copyright could be waived under CC0 declaration: JK

35 IV. Privacy Concerns Any regulations apply to the data (for example HIPAA for health care related data)? Any ethical issues in data management? Privileged or confidential information should be released only in a form that protects the privacy of individuals and subjects involved. Data-sharing policies for awards that involve human subjects should recognize and address human-subjects protocols and the need to protect privacy and confidentiality.

36 IV. Intellectual Property Concerns
Some proposals may involve proprietary or other restricted data. For example, projects having proprietary information that will eventually lead to commercialization, such as Engineering Research Center (ERC), Nanoscale Science and Technology Center (NSEC), Industry/University Cooperative Research Center (I/UCRC), Small Business Innovative Research (SBIR), Small Business Technology Transfer (STTR), and Grant Opportunities for Academic Liaison with Industry (GOALI) awards. Any such data-management issues should be discussed as well as the conditions that might prevent or delay the sharing of data. The proposal’s DMP would address the distinction between released and restricted data and how they would be managed. Exceptions to the basic data-management policy should be discussed with the cognizant program officer before submission of such proposals.

37 V. Plans for Eventual Transition or Termination of the Data
Provisions for transition or termination could entail the following: Do you need to destroy the data after a certain period? How permanent is the data? Long-term (10 years or more)? Or short-term (3-5 years or less)? JK

38 V. Plans for Eventual Transition or Termination of the Data
Do you have the right to do it (Is this your data, is it copyrighted, etc.)? Do you need to keep all versions? Just final version? First and last? Depends on re-processing costs. If you can re-process the data, do so. JK

39 V. Plans for Eventual Transition or Termination of the Data
Are there any legal or ethical obligations for secure removal of data after a certain period? e.g. HIPAA data –Health Insurance Portability and Accountability Act) How do you plan to destroy the data? (Degaussing---exposure of the media to magnetic field, software tools to wipe disks out, destruction of the media---there are companies that do it) JK

40 V. Plans for Eventual Transition or Termination of the Data
Is there a need and plan to migrate/transition your data to another media or structure (after keeping the data for a long period of time)? JK

41 V. Good Practices Test data restore from backup
Check documentation and metadata Are files still readable? Still accessible at the published URL? Migrate files to newer formats Update software to read/write data Weed out obsolete data (and destroy where appropriate) JK

42 V. Post-Project Plans Leadership Opportunities
Metadata schema development Repository development Collaboration is leadership Standards context OAIS reference model (ISO14721:2003) Submission/Ingest Archiving, including “fixity checks” (via checksums) Dissemination

43 VI. Example DMPs A (gold handout) B (salmon handout)

44 VI. Data Management Plan
Supplementary document entitled Data Management Plan (no more than 2 pages) Not included in 15-page limit for proposal bodies May not be needed, e.g., because the project doesn’t deal with data, but that must be stated/justified in the “plan” Will be reviewed as part of intellectual merit or broader impacts of the proposal, or both 44 44

45 VI. Monitoring/Reporting
Annual Reports Progress on data production Progress on sharing and dissemination of data Final Project Report Data produced during the award Data to be retained after the award expires How data will be available for sharing How data will be disseminated Formats used including any metadata Location of data (archive/storage) Future Proposals Data management issues included in “Results of prior NSF support”

46 VI. Examples/Samples/Templates
DMP depends on discipline, types of data, nature of the project Difficult to provide templates DMP should provide answers to the questions NSF posted; these can be used as headers; Examples can be found through links provided on the UConn website

47 UConn Resources Libraries: UITS:

48 Thank you! Questions? Our contact info: Library: Jennifer Eustis, Catalog/Metadata Librarian David Lowe, Digital Preservation Librarian Carolyn Mills, Liaison to Biology and Agriculture Office of Sponsored Programs: Antje Harnisch, Assistant Director, Pre-Award and Contract Services University ITS: Jila Kazerounian, Manager of Web Development and Integration Technologies


Download ppt "Data Management for Grant Funded Projects"

Similar presentations


Ads by Google