Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University.

Slides:



Advertisements
Similar presentations
Agency for Healthcare Research and Quality (AHRQ)
Advertisements

Health Problems and the Community Acute Upper Respiratory Tract Infection.
Significance Testing.  A statistical method that uses sample data to evaluate a hypothesis about a population  1. State a hypothesis  2. Use the hypothesis.
Deriving Biological Inferences From Epidemiologic Studies.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
Genetic Analysis in Human Disease
Epidemiologic study designs
Wrapup. NHGRI strategic plan What does the NIH think genomics should be for the next 10 years? [Nature, Feb. 2011]
STUDY DESIGN CASE SERIES AND CROSS-SECTIONAL
Influencing Change in Research, Treatment Protocols, and New Drug Development.
Area 4 SHARP Face-to-Face Conference Phenotyping Team – Centerphase Project Assessing the Value of Phenotyping Algorithms June 30, 2011.
Some comments on the 3 papers Robert T. O’Neill Ph.D.
1 FSTL4 and SEMA5A are associated with alcohol dependence: meta- analysis of two genome-wide association studies Kesheng Wang, PhD Department of Biostatistics.
Genetic Epidemiology Lecture 13 PS Timiras. A Few Definitions GENOME: THE COMPLETE SET OF GENES OF AN ORGANISM GENOTYPE: THE GENETIC CONSTITUTION OF.
 What Problems Cause Distress and Impair Functioning?  What Problems Cause Distress and Impair Functioning?  Why do People Behave in Unusual Ways?
Office hours Wednesday 3-4pm 304A Stanley Hall Review session 5pm Thursday, Dec. 11 GPB100.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
COHORT AND CASE-CONTROL DESIGNS Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa SUMMER COURSE: INTRODUCTION TO EPIDEMIOLOGY.
Introduction to Molecular Epidemiology Jan Dorman, PhD University of Pittsburgh School of Nursing
COHORT STUDY DR. A.A.TRIVEDI (M.D., D.I.H.) ASSISTANT PROFESSOR
The Biological Explanations of Schizophrenia 1. Genetics 2. Biochemistry 3. Evolutionary Theory.
Understanding Genetics of Schizophrenia
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
Supercourse Environmental Exposure Assessment And Biomarkers Wael Al-Delaimy, MD, PhD.
Dr. Abdulaziz BinSaeed & Dr. Hayfaa A. Wahabi Department of Family & Community medicine  Case-Control Studies.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Health registers as a ressource for research Preben Bo Mortensen Director, Professor, DrMedSc National Centre for Register-based Research Aarhus University.
CHP400: Community Health Program- lI Research Methodology STUDY DESIGNS Observational / Analytical Studies Case Control Studies Present: Disease Past:
Web of Causation; Exposure and Disease Outcomes Thomas Songer, PhD Basic Epidemiology South Asian Cardiovascular Research Methodology Workshop.
Twin Registers: The Danish Twin Registry Axel Skytthe, Institute of Pubic Health, University of Southern Denmark.
Study Designs Afshin Ostovar Bushehr University of Medical Sciences Bushehr, /4/20151.
Anticipated FY2016 Appropriations Agency$ Million NIH200 Cancer70 Cohort130 FDA10 Office of the Natl Coord. for Health IT (ONC) 5 TOTAL215 Mission: To.
Precision Medicine A New Initiative. The Concept of Precision Medicine (PM) The prevention and treatment strategies that take individual variability into.
Karri Silventoinen University of Helsinki Osaka University.
Presented by Alicia Naegle Twin Studies. Important Vocabulary Monozygotic Twins (MZ)- who are identical twins Dizygotic Twins (DZ)- who are twins that.
Wisconsin Genomics Initiative W isconsin M edical R esearch T riangle Wisconsin Medical Discovery Triangle.
Case Control Study Dr. Ashry Gad Mohamed MB, ChB, MPH, Dr.P.H. Prof. Of Epidemiology.
1 Study Design Issues and Considerations in HUS Trials Yan Wang, Ph.D. Statistical Reviewer Division of Biometrics IV OB/OTS/CDER/FDA April 12, 2007.
What’s the Difference? Genetic and Common Diseases.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
Unit 9: Genetic Epidemiology. Unit 9 Learning Objectives: 1. Understand characteristics, uses, strengths, and limitations of genetic epidemiology study.
D4FF55A0-6B6F BF422A9BA9 Present by: Xiao Chen On December 7, 2015.
Descriptive study design
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Familial coronary artery disease Paul Brennan Clinical Director Northern Genetics Service Newcastle Hospitals NHS Foundation Trust North East and North.
Types of Studies. Aim of epidemiological studies To determine distribution of disease To examine determinants of a disease To judge whether a given exposure.
An atlas of genetic influences on human blood metabolites Nature Genetics 2014 Jun;46(6)
Next generation genomics: translation into clinically useful applications in health care Prof.dr. Martina Cornel
Quantitative genetics
Genomics and Disease Gene Identification. Is the Disease Genetic or Environmental.
Antibiotic use and bacterial complications following upper respiratory tract infections: a population based study.
High-Throughput Machine Learning from EHR Data
Descriptive study design
DTC genetic testing in the clinical care context: Personalized medicine from the patient/pin-cushion perspective Jessica D. Tenenbaum, PhD Duke University.
Present: Disease Past: Exposure
THE ROLE OF NEXT GENERATION SEQUENCING IN CLINICAL PRACTICE
Precision Cardiovascular Medicine: State of Genetic Testing
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
Power to detect QTL Association
Beyond GWAS Erik Fransen.
Wisconsin Genomics Initiative
Precision Cardiovascular Medicine: State of Genetic Testing
Chapter 7 Multifactorial Traits
Discovery From Data Repositories H Craig Mak  Nature Biotechnology 29, 46–47 (2011) 2013 /06 /10.
Network-wide Milestones – Plan to Address & Achieve Domains of focus for supplemental funding request. Sites will work with workgroups to generate milestones.
RISK ASSESSMENT, Association and causation
Regulatory Perspective of the Use of EHRs in RCTs
Enhancing Causal Inference in Observational Studies
Enhancing Causal Inference in Observational Studies
Presentation transcript:

Large-scale phenome-wide scan in twins using electronic health records June 29 th 2015 Scott Hebbring Marshfield Clinic Research Foundation University of Wisconsin Madison

Association studies GWAS: Thousands of variants associated with a few hundred phenotypes a. Relatively easy to recruit unrelated individuals b. Multiple testing challenges a. Weak effects b. Difficult to interpret biology c. Clinical utility? d. Disease limited PheWAS: Dramatically increases the number of diseases that can be studied a. Can start with biologically/clinically relevant variants b. May be limited to the same challenges of GWAS Family studies Linkage,Segregation Analysis, Heritability… a. Thousands of mutations in thousands of genes causing human diseases. b. Often easier to interpret biology c. large effect sizes d. Clinically relevant e. Difficult to recruit families f. One disease at a time Human Genetics

Classical Twins Studies 1. Gold standard for heritability studies Unique family/genetic relationships (monozygotic twins) Strong shared environmental exposures starting in utero 2. Rare (~20/1,000 births) 3. Difficult to recruit Largest twin registries include the Swedish and Danish twin registries (~200,000 twins) Others: UK Adult, Australian, Sri Lankan, and Chinese National Minnesota, Univ-Wash, MI-State, Mid-Atlantic twin registries. Sample ascertainment bias 4. Phenotypic data is often acquired by surveys and questionnaires and limited to only a few measurables. 5. Updating data is costly and labor intensive.

2.6 Million patients Twin population -same last name -same date of birth -same billing account -same home address -key word “twin” Marshfield Clinic Twin Cohort (~16,000 patients)

Genet Epidemiol Dec;38(8):692-8.

A.MCTC is one of the first cross sectional twin population ~80% accuracy B.Methods are easily translatable ~12,000 twins have been ID in Mayo’s EHR. C.Little to no zygosity data D.All patients are uniquely linked to Marshfield Clinic’s EHR. Phenotypic data is collected in real time Not disease limited Amendable to phenome- wide strategies? Genet Epidemiol Dec;38(8):692-8.

Hypothesis: EHR-linked twin cohorts can be used for phenome-wide studies to identify diseases with genetic etiologies. Methods Population: MCTC and Mayo twin cohort (28,888 twins) Phenotypes were defined by collapsing ICD9 coding e.g., ICD  100.0*  100.* For every phenotype/ICD9 codes, a p-value was estimated to determine if the disease co-occurred in twins more frequently that by chance. For every phenotype/ICD9 code, a relative risk was estimated which estimated the risk of disease if the other twin is affected relative to the population risk in the twin cohorts.

9,906 and 5,987 unique phenotypes/ICD9 codes in MCTC and Mayo-TC, respectively 5,598 shared phenotypes/ICD9 codes Diseases in MCTC were more common than in Mayo-CT

Hypothesis: EHR-linked twin cohorts can be used for phenome-wide studies to identify diseases with genetic etiologies. Methods Population: MCTC and Mayo twin cohort (28,888 twins) Phenotypes were defined by collapsing ICD9 coding e.g., ICD  100.0*  100.* For every phenotype/ICD9 codes, a p-value was estimated to determine if the disease co-occurred in twins more frequently that by chance. For every phenotype/ICD9 code, a relative risk was estimated which estimated the risk of disease if the other twin is affected relative to the population risk in the twin cohorts.

Phenome-wide Scan A.1,222 phenotypes/ICD9 codes were statistically enriched for concordance in MCTC (p<8.9E-6) 929 (76%) were replicated in Mayo-TC (p<0.05) B.928 phenotypes/ICD9 codes were statistically enriched for concordance in Mayo-TC 739 (80%) were replicated in MCTC C.1,406 phenotypes were statistically enriched for concordance by combined meta-analysis

Phenome-wide Scan

MCTCMayo-TCCombined ICD9DiseaseAffectedP-valueRRAffectedP-valueRRP-value 382.9Unspecific otitis media 4,3185.0E ,1304.4E E Suppurative and unspecified otitis media 4,5143.4E ,2754.8E E Acute upper respiratory infections of unspecified site 5,2721.5E ,2238.2E E Acute upper respiratory infections of multiple or unspecified sites 5,2971.2E ,2502.0E E Acute pharyngitis 5,2024.9E E E Disturbances in tooth eruption 1,3508.3E E E Lack of expected normal physiological development in childhood E E E Disorders of tooth development and eruption 1,5567.3E E E Cough 4,2454.1E E E Acute bronchiolitis E E E Specific delays in development E E E Disorders of refraction and accommodation 3,6458.1E E E Fever and other physiologic disturbances of temperature regulation 2,8751.6E E E Developmental speech or language disorder E E E Myopia 2,1449.5E E E-158 Top non V-codes and perinatal codes

Hypothesis: EHR-linked twin cohorts can be used for phenome-wide studies to identify diseases with genetic etiologies. Methods Population: MCTC and Mayo twin cohort (28,888 twins) Phenotypes were defined by collapsing ICD9 coding e.g., ICD  100.0*  100.* For every phenotype/ICD9 codes, a p-value was estimated to determine if the disease co-occurred in twins more frequently that by chance. For every phenotype/ICD9 code, a relative risk was estimated which estimated the risk of disease if the other twin is affected relative to the population risk in the twin cohorts.

Relative Risks RR=relative risk ADF=average disease frequency

1,455 phenotypes/ICD9 codes had at least one concordant pair in both cohorts 498 and 139 phenotypes had RRs >10 and >100 in both cohorts, respectively

MCTCMayo-TCCombine d ICD9DiseaseAffectedConcordantP-valueRRAffectedConcordantP-valueRRP-value 282.6Sickle-cell disease312.7E-042, E-046,0968.0E Hereditary spherocytosis312.7E-042, E-042,0321.7E Peroneal muscular atrophy 312.7E-042, E-042,0321.7E Other thalassemia312.7E-042, E E Other cerebellar ataxia312.7E-042, E E Long QT syndrome414.9E-041, E E-19 Genetic diseases with large estimated RRs

Same-Sex Opposite-Sex

Same-Sex Opposite-Sex

Potential limitations 1. Limited by the inherent challenges of ICD9 coding. 2. Parental/Familial biases 3. Lack of zygosity still limits this approach NLP or blood types may help enrich for specific twin types. Conclusions 1. Most diseases are not random events in the twins. a. 1,406/5,598 (25%) of phenotypes are statistically enriched in pairs of twins b. ~1% of phenotypes have RRs < Genetics plays an important component to the diseases process for thousands of diseases. 3. Family data may be efficiently captured in in EHR and may be used to predict, prevent, and treat human disease for the advancement of “precision medicine.”

Precision Medicine Future of genomic research Populations Genome

Precision Medicine Future of genomic research Populations Genome Phenome

Families Precision Medicine Future of genomic research Populations Genome Phenome

Acknowledgements Marshfield Clinic: Murray Brilliant Peggy Peissig Steven Schrodi Zhan (Harold) Ye John Mayer many more… Mayo Clinic: Jyotishman Pathak Yijing Cheng Funding: NHGRI1U01HG NLMK22LM NCATS9U54TR NCRR1UL1RR Marshfield Clinic Research Foundation Marshfield Clinic donors