Presentation on theme: "Obtaining The Numbers Behind the Translational Imperative Harvard Medical School Center for Biomedical Informatics i2b2 National Center for Biomedical."— Presentation transcript:
Obtaining The Numbers Behind the Translational Imperative Harvard Medical School Center for Biomedical Informatics i2b2 National Center for Biomedical Computing Isaac S. Kohane, MD, PhD John Glaser, PhD Susanne Churchill, PhD
Example: PPAR Pro12Ala and diabetes Estimated risk (Ala allele) Deeb et al. Mancini et al. Ringel et al. Meirhaeghe et al. Clement et al. Hara et al. Altshuler et al. Hegele et al. Oh et al. Douglas et al. All studies Lei et al. Hasstedt et al Sample size Ala is protective Mori et al. Overall P value = 2 x Odds ratio = 0.79 ( ) Courtesy J. Hirschhorn
And here comes commercialization ( MD’s not required ) Knome has launched the first commercial whole-genome sequencing and analysis service for individuals for $350,000 per genome. The sequence data will undergo comprehensive analysis from a team of ….
Challenge: Efficiently Reach Large N High throughput genotyping High throughput phenotyping High throughput sample acquisition DHHS Secretary’s Advisory Committee on Genetics, Health, and Society (SACGHS) argues for the health value of a 500,000 to 1M subject study. Estimated cost: $3,000,000,000 Cost of the pediatric 100,000 study recently launched >> $1B + decades.
NLP (and comedy) is not pretty HOSPITAL COURSE:... It was recommended that she receive …We also added Lactinax, oral form of Lactobacillus acidophilus to attempt a repopulation of her gut. SH: widow,lives alone,2 children,no tob/alcohol. BRIEF RESUME OF HOSPITAL COURSE: 63 yo woman with COPD, 50 pack-yr tobacco (quit 3 wks ago), spinal stenosis,... SOCIAL HISTORY: Negative for tobacco, alcohol, and IV drug abuse. SOCIAL HISTORY: The patient is a nonsmoker. No alcohol. SOCIAL HISTORY: The patient is married with four grown daughters, uses tobacco, has wine with dinner. Smoker Non-Smoker SOCIAL HISTORY: The patient lives in rehab, married. Unclear smoking history from the admission note… Past Smoker Hard to pick ???
But it works 96,000 asthma patients identified out of 2.5M PHS patients –Stratified by severity, pharmaco-responsiveness and exposures –Now with cases and controls (from extrema) reconsented and biomaterials obtained for genome-wide scans ++ –3 methods of tissue acquisition
The three prongs of High Throughput Instrumentation $250-$500 for 500,000 SNP’s $50-100K for good quality phenotyping of 100K++ individuals What about the samples (consented) –$650/patient Dozens a week –Wait in clinic: $450+/patient Crimson –Lynn Bry, MD
The “Sip” Does the data exist? Data Access Aggregate Counts Only Requirements HCTSC Investigator Completed Training
“Deep Drink” Request Patient Details Data Access Identified Patient Data Requirements PI at Each Source Institution Protocol-Specific IRB Approval
HCTSC Source Institutions
i2b2 Hive: A Translational Toolkit Data Repository (CRC) File Repository Identity Management Ontology Management Data Queries Data Visualization Correlation Analysis De - Identification Of data Natural Language Processing Annotating Genomic Data Project Management Workflow Framework Visual Term Mapping https://www.i2b2.org/software/