Presentation is loading. Please wait.

Presentation is loading. Please wait.

21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and.

Similar presentations


Presentation on theme: "21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and."— Presentation transcript:

1 21 April 2015

2 NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and Metadata Budgeting for Data Management Resources

3  Applies to the sharing of final research data* for research purposes.  Applies to basic research, clinical studies, surveys, and other types of research supported by NIH and to research that involves human subjects and laboratory research that does not involve human subjects.  Applies to applicants seeking $500,000 or more in direct costs in any year of the proposed project period through grants, cooperative agreements, or contracts.  Applies to research applications submitted beginning October 1, 2003. * Final Research Data - Recorded factual material commonly accepted in the scientific community as necessary to document and support research findings. This does not mean summary statistics or tables. It means the data on which summary statistics and tables are based. For the purposes of this policy, final research data do not include laboratory notebooks, partial datasets, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as gels or laboratory specimens.

4  “Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants.”  “Grantees are expected to encourage and facilitate such sharing.”  Data refers to any information that can be stored in digital form, including text, numbers, images, video or movies, audio, software, algorithms, equations, animations, models, simulations, etc. Such data may be generated by various means including observation, computation, or experiment  Applies to research applications submitted on or after January 18, 2011.

5

6  Research data collections  Products of one or a few focused research projects  Resource or community data collections  Serve a specific research community  Typically fall between research and reference data collections in size, scale, funding, community of users, and duration  Conform to community standards  Reference data collections  Serve large segments of the research and education communities  Conform to robust and comprehensive standards

7  An opportunity for PIs to articulate how they will conform to the FEDERAL data sharing policy for research results.  The DMP is reviewed as an integral part of the proposal, coming under ‘Intellectual Merit’ or ‘Broader Impacts’ or both, as appropriate for the scientific community of relevance.  Data management requirements and plans may change across specific Directorates, Offices, Divisions, Programs, or other NSF/NIH units.

8  The types of data, samples, physical collections, software, curriculum materials, publications, and other materials to be produced in the course of the project;  The standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies);  Policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements;  Policies and provisions for re-use, re-distribution, and the production of derivatives; and  Plans for archiving data, samples, and other research products, and for preservation of access to them.

9 ElementDescription?NSF Mapping Data descriptionA description of the information to be gathered; the nature and scale of the data that will be generated or collected.YesExpected Data Existing dataA survey of existing data relevant to the project and a discussion of whether and how these data will be integrated.YesExpected Data Format Formats in which the data will be generated, maintained, and made available, including a justification for the procedural and archival appropriateness of those formats. YesData Format and Dissemination MetadataA description of the metadata to be provided along with the generated data, and a discussion of the metadata standards used.YesData Format and Dissemination Storage and backup Storage methods and backup procedures for the data, including the physical and cyber resources and facilities that will be used for the effective preservation and storage of the research data. Yes Data Storage and Preservation of Access Security A description of technical and procedural protections for information, including confidential information, and how permissions, restrictions, and embargoes will be enforced. YesData Format and Dissemination ResponsibilityNames of the individuals responsible for data management in the research project.YesRoles and Responsibility Intellectual property rights Entities or persons who will hold the intellectual property rights to the data, and how IP will be protected if necessary. Any copyright constraints (e.g., copyrighted data collection instruments) should be noted. YesData Format and Dissemination Access and sharing A description of how data will be shared, including access procedures, embargo periods, technical mechanisms for dissemination and whether access will be open or granted only to specific user groups. A timeframe for data sharing and publishing should also be provided. Yes Data Storage and Preservation of Access AudienceThe potential secondary users of the data.YesData Format and Dissemination Selection and retention periods A description of how data will be selected for archiving, how long the data will be held, and plans for eventual transition or termination of the data collection in the future. YesPeriod of Data Retention Archiving and preservation The procedures in place or envisioned for long-term archiving and preservation of the data, including succession plans for the data should the expected archiving entity go out of existence. Yes Data Storage and Preservation of Access Ethics and privacy A discussion of how informed consent will be handled and how privacy will be protected, including any exceptional arrangements that might be needed to protect participant confidentiality, and other ethical issues that may arise. YesData Format and Dissemination Budget The costs of preparing data and documentation for archiving and how these costs will be paid. Requests for funding may be included. Data organizationHow the data will be managed during the project, with information about version control, naming conventions, etc. Quality AssuranceProcedures for ensuring data quality during the project. Legal requirementsA listing of all relevant federal or funder requirements for data management and data sharing.

10

11  Explains how the responsibilities regarding the management of your data will be delegated.  Time allocations  Project management of technical aspects  Training requirements  Contributions of non-project staff - individuals should be named where possible(custodians of the repository/archive you choose to store your data

12  Outlines the staff/organizational roles and responsibilities for implementing this data management plan.  Who will be responsible for data management and for monitoring the data management plan?  How will adherence to this data management plan be checked or demonstrated?  What process is in place for transferring responsibility for the data?  Who will have responsibility over time for decisions about the data once the original personnel are no longer available?

13  Is the data regulated by policy or law?  Are there legal constraints (e.g., HIPAA) on sharing data?  How will you handle informed consent with respect to communicating to respondents that the information they provide will remain confidential when data are shared or made available for secondary analysis?  Determine constraints if classified data, specific handling requirements, IRB/human subject research  If yes, how will you comply with these constraints?  Write your compliance plan point by point  If applicable, how will you manage disclosure risk in the data to be shared and archived?  Is there intellectual property (e.g., patent, copyright) rights on the datasets?  Determine restrictions and conditions to share and disseminate  Does someone else own the data? What are their conditions for use, sharing, and dissemination?

14  Determine DMPs as established by any international research consortia or set forth in formal science and technology agreements signed by the United States Government and foreign counterparts.  This should be addressed with any international research partners when first planning a collaboration.  Talk to the Program Officer for additional assistance.

15

16  Inputs and outputs (existing, intermediary, and final datasets)  Existing data and sources you are using (Digital and physical collections)  Quantitative Social and Economic Data Sets  Numeric data sets, geospatial data, spatio-temporal data  Qualitative Information  Microfilms, historical documents, oral interviews, video tapes, hand written records, transcripts, tables, figures, flowcharts, 3D models, digital audio  Experimental Research  Tabulated data  Mathematical and Computer Models  May include descriptions in published articles or fully documented and robust versions of these models

17  Determine formats and estimated size, and if it will be shared  Formats: RTF text, MS Excel converted to CSV, MATLAB, PNG (images), WAV audio, MPEG video, shapefile, as well as any instrument-specific formats or software Size/amount: Rate produced, e.g., 1 TB/year, 50GB/experiment  Metadata should be machine readable for better re-usability and processing. HINT: Sketching a diagram of data workflow helps to identify datasets and issues re their management.

18  Give a short description of what "data" will mean in your research  What data will be generated in the research?  What data types will you be creating or capturing?  How will you capture or create the data?  If you will be using existing data, state that fact and include where you got it.  What is the relationship between the data you are collecting and the existing data?  What data will be preserved and shared?

19  “Data about data”  Typical functions  Discovery tool  Rights management  Version identification  Certify authenticity  Status indicator  Defines content structure  Interoperability  Situates geospatially  Process descriptions  Access and transfer ObjectivesDomainsArchitecture Objectives Principles Discipline Genre Format Structure Extent Granularity

20  What details (metadata) are necessary for others to use your data?  List standards for formats or metadata for your datasets.  Document why you selected them  Describe the method by which metadata will be generated.  Document naming conventions/schema for your data.  List the data dictionaries/taxonomies/ontologies you will use for your data.  Describe how you will track versions of the datasets.  List and describe the tools that are necessary to use the datasets.

21  OAIS, Open Archival Information System OAIS  CSDGM, Content Standard for Digital Geospatial Metadata CSDGM  ICPSR, Inter-university Consortium for Political and Social Research ICPSR  DDI, Data Documentation Initiative** DDI  best practices: data life cycle and longitudinal datadata life cyclelongitudinal data  SDMX, Statistics Data and Metadata Exchange SDMX  XML, Extensible Markup Language XML

22  Citation is the preferred form of acknowledgement  Should include a doi to establish authouritative data source or a PURL (Persistent Uniform Resource Location)  Citation: Involuntary Commitment Data, public use dataset [restricted use data, if appropriate]. Produced and distributed by the PSRDC, College of Behavioral and Community Sciences, University of South Florida (year data were downloaded). URL Acknowledgement: The collection of data used in this study was partly supported by the National Institutes of Health under grant number R01 HD069609 and the National Science Foundation under award number 1157698.

23

24  Document which of the digital or non-digital datasets listed will NOT be stored or retained during the project.  Document the type of media and the location(s) where the data will be stored and who is responsible.  Document how and where the data will be backed up and who is responsible.  Document any access controls for data and/or data transfers that need to be secured and how these controls will be applied.

25  Indicate which datasets used or generated will be shared  Indicate which any datasets are in proprietary formats and if they will be converted to a non-proprietary format for sharing.  Determine the audience who will use the datasets.  Determine acknowledgement protocol  Determine sharing protocols: open access or release upon request.  Account for any delay in the accessibility of your data after your research is done.  Explain details of any embargo periods.  Determine how long will data be kept beyond the life of the project  Will a third-party service be used to archive or release data?  Set a release date to share the data.  Describe any restrictions on use, sharing, repurposing, etc. of datasets  Include costs of any additional resources (3 rd party services, etc.) in budget.

26  Under the auspices of the PI  Data archive: A place where machine-readable data are acquired, manipulated, documented, and finally distributed to the scientific community for further analysis.  Data enclave: A controlled, secure environment in which eligible researchers can perform analyses using restricted data* resources.restricted data  Mixed mode sharing. **Restricted Data - datasets that cannot be distributed to the general public, because of, for example, participant confidentiality concerns, third- party licensing or use agreements, or national security considerations.

27  Builds upon storage by taking additional steps toward preserving digital files.  Safeguards data against file corruption of storage media.  Includes updating from obsolete formats.  Often includes enhanced discovery and access of datasets.  Includes a preservation strategy and disaster recovery plan.  Often handled by an third-party archiving service or data repository.  Check university guidelines.  Include deposit fees in budget.

28 Example 1 The proposed research will involve a small sample (less than 20 subjects) recruited from clinical facilities in the New York City area with Williams syndrome. This rare craniofacial disorder is associated with distinguishing facial features, as well as mental retardation. Even with the removal of all identifiers, we believe that it would be difficult if not impossible to protect the identities of subjects given the physical characteristics of subjects, the type of clinical data (including imaging) that we will be collecting, and the relatively restricted area from which we are recruiting subjects. Therefore, we are not planning to share the data. Example 2 The proposed research will include data from approximately 500 subjects being screened for three bacterial sexually transmitted diseases (STDs) at an inner city STD clinic. The final dataset will include self-reported demographic and behavioral data from interviews with the subjects and laboratory data from urine specimens provided. Because the STDs being studied are reportable diseases, we will be collecting identifying information. Even though the final dataset will be stripped of identifiers prior to release for sharing, we believe that there remains the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and associated documentation available to users only under a data-sharing agreement that provides for: (1) a commitment to using the data only for research purposes and not to identify any individual participant; (2) a commitment to securing the data using appropriate computer technology; and (3) a commitment to destroying or returning the data after analyses are completed. Example 3 This application requests support to collect public-use data from a survey of more than 22,000 Americans over the age of 50 every 2 years. Data products from this study will be made available without cost to researchers and analysts at https://ssl.isr.umich.edu/hrs/. User registration is required in order to access or download files. As part of the registration process, users must agree to the conditions of use governing access to the public release data, including restrictions against attempting to identify study participants, destructionhttps://ssl.isr.umich.edu/hrs/ http://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm

29  It is acceptable to state in the DMP that the project is not anticipated to generate data or samples that require management and/or sharing.  PIs should note that the statement will be subject to peer review.  If data you generate is owned by your institution, the data access plan must address the institutional strategy for providing access to relevant data and supporting materials.  Open-access publishing is not addressed in the implementation of the data management plan requirement.

30

31  Documenting  Preparing  Publishing  Disseminating  Sharing research findings and supporting material  Data sharing and archiving NOTE: If the data have been collected already, a competitive or administrative supplement may be available. Reports Reprints Page charges or other journal costs Does not cover costs for prior or early publication Illustrations Cleanup Documentation Storage and indexing of data and databases Development, documentation and debugging of software Storage, preservation, documentation, indexing, etc., of physical specimens, collections or fabricated items. Types of Activities Covered

32

33  DMPTool (Argonne Laboratories)Argonne Laboratories  NIH  Data Sharing Policy and Implementation Guidance Data Sharing Policy and Implementation Guidance  8.2 Availability of Research Results 8.2 Availability of Research Results  NSF  NSF Data Sharing Policy NSF Data Sharing Policy  NSF Data Management Plan Requirements NSF Data Management Plan Requirements  NSF Social, Behavioral and Economic (SBE) Directorate-wide Guidance NSF Social, Behavioral and Economic (SBE) Directorate-wide Guidance  ICPSR  Effective Data Management Effective Data Management  Databib  Registry of Research Data Repositories Registry of Research Data Repositories  DataONE  Best Practices Best Practices


Download ppt "21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and."

Similar presentations


Ads by Google