Making Data from NIAAA Funded Grants

Making Data from NIAAA Funded Grants
Available to the Research Community Greg Farber Office of Technology Development and Coordination National Institute of Mental Health

Why Do We Care About Making Data Available?
Understanding the biological basis of human disease is a very hard problem, and we are NOT making progress quickly enough. There are two reasons why this is hard: the underlying biology is complex individual variation is surprisingly large For “simple diseases” (genomic diseases with high penetrance, many diseases caused by an infections agent), big data is not terribly useful. In these cases, we “just” need to deal with the biology. Most of what we deal with today are complex diseases where the individual variation and the environment are important components. In these cases, we need to both understand the biology as well as have data from many individuals to understand individual variation and the number of “sub-groups” in a population.

Many Components of NIH are Trying to Make Data from Human Subjects Available
All of Us Research Program (formerly Precision Medicine) NHLBI makes their clinical trials available via BioLINCC. NIDA also makes data from the clinical trials they have funded available through the NIDA Data Share web site. The NIH Genomic Data Sharing policy expects NIH funded investigators to submit data to an appropriate repository. The new 21st Century Cures Act seems to give the NIH director the authority to require data sharing.

NIMH Data Archive NIMH has created a data infrastructure to hold data from experiments involving human subjects. That infrastructure now holds data from nearly 600 NIH funded awards as well as data supported by other funding agencies. Data types include: Clinical assessments Imaging and other “complex” data (eye tracking, EEG, PET…) Genomics data in the area of autism The data infrastructure has matured to the point where it is now possible to expand to areas like substance use.

Non-NIH Groups Using the NDA to Store Data

A Brief History The National Database for Autism Research was started in late 2006, and the first data was received in 2008. NIMH recently decided to expand NDAR to include data from: Clinical Trials (NOT-MH ) The Research Domain Criteria (RDoC) Initiative (NOT-MH ) The Adolescent Brain Cognitive Development Study All of the data is part of a single database (NIMH Data Archive, NDA) with branded web locations (

NIH/NIMH Data Archives Staff

NDA Overview NDA is a federal data repository.
The NDA only contains data from human subjects. We have the ability to manage data with different types of consent, but NIMH sites contain data that is broadly consented for use by the research community. NIMH data are available to the research community through a not too difficult application process that involves a data access committee (Currently support 4 independent DACs). Summary data are available to everyone with a browser. The data types include demographic data, clinical assessments, imaging, –omic data, and other complex data types (EEG…). Currently share data from nearly 130,000 subjects with the research community. ~800TB of imaging, –omic, and other complex experimental data is secured in the Amazon cloud.

NDA Implementation NDA has deep federation with the following data repositories. This federation allows NDA to query data in those repositories and to return data to the user from multiple repositories simultaneously. Autism Tissue Program Autism Genetic Resource Exchange Interactive Autism Network Simons Foundation Autism Research Initiative Ontario Brain Institute NDA has two key features to allow data standardization and aggregation: data dictionaries and the Global Unique Identifier (GUID) Generally, NIMH funded investigators are expected to share their data via NDA. Investigators with funding from other sources are also welcome to deposit their data.

NDA Structure It is best to think of NDA as a large (~130,000 research participants x ~130,000 data dictionary elements), sparse, two dimensional matrix.

Data Dictionary – The First Building Block
The NDA data dictionary is one of the key building blocks for this repository. It provides a flexible and extensible framework for data definition by the research community. 1500+ data collection instruments, freely available to anyone 130,000+ unique data elements (“questions”) Data collection instruments are defined research community with assistance from NDA staff Clinical Genomics/Proteomics MRI Modalities Other complex data (EEG, Eye Tracking) Accommodates any data type and data structure Curated by NDA Staff Allows investigators to quickly perform quality control tests of their data without submitting data anywhere.

Data Dictionary List (1500+ Measures)

Inside a Data Dictionary

Data Inspection – Available to All

Global Unique Identifier – the Other Building Block
The NDA GUID software allows any researcher to generate a unique identifier using some information from a birth certificate. If the same information is entered in different laboratories, the same GUID will be generated. This strategy allows NDA to aggregate data on the same subject collected in multiple laboratories without holding any of the personally identifiable information about that subject. NDA also assigns unique identifiers that do not allow data aggregation (pseudo-GUID) in cases where the GUID could not be generated. The GUID is now being used in other research communities (see

General Query – IAN Example – GUID Works

Query for Data by Laboratory/Award

Example of Information from a Particular Lab

Retrieve Data Associated with a Paper

A “Study” – Data Associated with a Publication

By Concept/Phenotype Results in 1,061 subjects being discovered

Existing Substance Use Data
Not surprisingly, a number of NIMH funded clinical trials and some clinical research studies have data related to substance use. Addiction Severity Index, N=65 Fagerstrom, N=1,034 Peer Substance Use, N=1,028 Peer Tolerance of Substance Use, N=1,029 Smoking History Questionnaire, N=194 Substance Abuse Disorders Log, N=196 Substance Use Monthly Form, N=404 Substance Use Questionnaire, N=2,055 Substance Use Survey, N=322 We expect that some NIAAA funded researchers will have collected data related to mental health.

How to NIMH Users Deposit Data?
At the start of the award, a data submission agreement is signed and the data archive creates data dictionaries that will be required for the user to submit data. Every 6 months, data are submitted to the data archive. At that time, data are checked by the validation tool to make sure they conform to the data dictionary. Submitting data is separate from releasing data to the research community (sharing). Sharing happens once a paper is published or one year after the completion of the grant. We have created a cost estimator and we ask our awardees to request data sharing costs when they submit applications. The costs are generally modest.

Researchers ARE using the NIMH Data Archive
Number of Registered Users Number of Collections

Advantage to NIAAA and Funded Investigators
NIAAA can use the existing infrastructure to get more out of data. Secondary data analysis Combination of multiple related data sets Increasing confidence in conclusions using data measured by a different group The quality of the data will increase The research community can work with NIAAA to establish minimal common data elements which will enhance the ability to merge/compare data from different laboratories.

Making Data from NIAAA Funded Grants

Similar presentations

Presentation on theme: "Making Data from NIAAA Funded Grants"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Making Data from NIAAA Funded Grants

Similar presentations

Presentation on theme: "Making Data from NIAAA Funded Grants"— Presentation transcript:

Similar presentations

About project

Feedback