Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies National Human Genome Research Institute National Institutes of Health.

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Alleles = A, a Genotypes = AA, Aa, aa
Qualitative and Quantitative traits
Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies National Human Genome Research Institute National Institutes of Health.
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Lecture for Tuesday September 23, 2003 What’s due? CH2 problem set Assignments: CH4 problems: 1-5, 8, 10, 11, 14, 16, 17, 21, 22 What’s due Thursday 9/25?
Guillaume Pare MD Genetic determinants of dabigatran plasma levels and their relation to bleeding On behalf of RE-LY Genetics: Guillaume Pare MD, Niclas.
Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.
QTL Mapping R. M. Sundaram.
Transmission Genetics: Heritage from Mendel 2. Mendel’s Genetics Experimental tool: garden pea Outcome of genetic cross is independent of whether the.
Biometrical genetics Manuel Ferreira Shaun Purcell Pak Sham Boulder Introductory Course 2006.
Challenges of an Epidemiologist Working in Genomics Wendy Post, MD, MS Associate Professor of Medicine and Epidemiology Cardiology Division Johns Hopkins.
Biometrical genetics Manuel Ferreira Shaun Purcell Pak Sham Boulder Introductory Course 2006.
Genetic Theory Manuel AR Ferreira Egmond, 2007 Massachusetts General Hospital Harvard Medical School Boston.
More Powerful Genome-wide Association Methods for Case-control Data Robert C. Elston, PhD Case Western Reserve University Cleveland Ohio.
Genome-Wide Association for the Rest of Us: Introduction and Goals National Human Genome Research Institute National Institutes of Health U.S. Department.
Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis Dinu et al, J. Biomedical.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Practical Considerations in Statistical Genetics Ashley Beecham June 19, 2015.
Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,
Analysis of genome-wide association studies
Genes, Environment- Lifestyle, and Common Diseases Chapter 5.
Genome-Wide Association (GWA) Studies National Human Genome Research Institute National Institutes of Health U.S. Department of Health and Human Services.
Factors to Consider in Selecting a Genotyping Platform Elizabeth Pugh June 22, 2007.
Evidence-Based Medicine 3 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
1. Relation between dietary macronutrient and fiber intake with metabolic syndrome in Tehranian adults: Tehran Lipid and Glucose Study Hosseinpour S,
Population genetics and Hardy-Weinberg equilibrium.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
Figure S1. Quantile-quantile plot in –log10 scale for the individual studies The red line represents concordance of observed and expected values. The shaded.
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
1 Genes, Environment- Lifestyle, and Common Diseases Chapter 5.
Contingency tables Brian Healy, PhD. Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical.
Complement Factor H Polymorphism in Age- Related Macular Degeneration* *Klein RJ, et al. Science. 2005; 308:
Genes, Environment, and Common Diseases Chapter 5 Mosby items and derived items © 2010, 2006 by Mosby, Inc., an affiliate of Elsevier Inc.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Eran Halperin November 10, 2009
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control design Matt McQueen, Jessica.
GenABEL: an R package for Genome Wide Association Analysis
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Assessment of genomewide association studies Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Data Quality Control Suzanne M. Leal Baylor College of Medicine Copyrighted © S.M. Leal 2015.
Chapter 23: Evaluation of the Strength of Forensic DNA Profiling Results.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Admixture Mapping Controlled Crosses Are Often Used to Determine the Genetic Basis of Differences Between Populations. When controlled crosses are not.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
GENETIC MARKERS OF CORONARY ARTERY DISEASE RISK GALYA ATANASOVA MD, PhD DOMINIC JAMES.
Population stratification
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Quantitative genetics
Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: How to Interpret a Genome-wide Association Study JAMA.
HS-LS-3 Apply concepts of statistics and probability to support explanations that organisms with an advantageous heritable trait tend to increase in proportion.
Methods of Presenting and Interpreting Information Class 9.
Power Calculations for GWAS
Genome Wide Association Studies using SNP
Preparing data for GWAS analysis
Population stratification
Genetics of qualitative and quantitative phenotypes
MENDEL AND THE GENE IDEA OUTLINE
Exercise: Effect of the IL6R gene on IL-6R concentration
Jae Woong Sull, Sun Ha Jee Eulji University, Yonsei University
Medical genomics BI420 Department of Biology, Boston College
Medical genomics BI420 Department of Biology, Boston College
Presentation transcript:

Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies National Human Genome Research Institute National Institutes of Health U.S. Department of Health and Human Services National Institutes of Health National Human Genome Research Institute Teri A. Manolio, M.D., Ph.D. Director, Office of Population Genomics and Senior Advisor to the Director, NHGRI, for Population Genomics

Topics to be Covered Discrete traits and quantitative traits Measures of association Detecting/correcting for false positives Genotyping quality control Quantile-quantile (Q-Q) plots Odds ratios: allelic and genotypic Models of genetic transmission Interactions: gene-gene, gene-environment

Larson, G. The Complete Far Side

Quantitative Genetics “…concerned with the inheritance of those differences between individuals that are of degree rather than of kind…” QuantitativeQualitative Continuous gradation among individuals from one extreme to other Sharply demarcated types with little connection by intermediates Effects of genes are smallEffects of genes are large Usually many genes Single genes inherited in Mendelian ratios? Falconer and Mackay, Quantitative Genetics 1996.

Inheritance Models in Single Gene Trait A a

Genotype Group ModelAAAaaa A is Dominant A is Recessive A is Co-Dominant Inheritance Models in Single Gene Trait

Inheritance Models in Quantitative Trait A x increase in height a x decrease in height

Population Mean Model-x 0+x A is Completely Dominant aa AA Aa A is Partially Dominant aa AaAA A is Not (Co-) Dominant aaAaAA A is Over- Dominant aaAA Aa Inheritance Models in Quantitative Trait

Quantitative Traits with Published GWA Studies ( ) QT interval Lipids and lipoproteins Memory Nicotine dependence ORMDL3 expression YKL-40 levels Obesity, BMI, waist Insulin resistance Height Bone mineral density F-cell distribution Fetal hemoglobin levels C-Reactive protein 18 groups of Framingham traits Pigmentation Uric Acid Levels Recombination Rate

Association of Alleles and Genotypes of rs (‘3049) with Myocardial Infarction C N (%) G N (%)  2 (1df) P-value Cases2,132 (55.4)1,716 (44.6) x Controls2,783 (47.4)3,089 (52.6) Allelic Odds Ratio = 1.38 Samani N et al, N Engl J Med 2007; 357:

Association of Alleles and Genotypes of rs (‘3049) with Myocardial Infarction C N (%) G N (%)  2 (1df) P-value Cases2,132 (55.4)1,716 (44.6) x Controls2,783 (47.4)3,089 (52.6) Allelic Odds Ratio = 1.38 CC N (%) CG N (%) GG N (%)  2 (2df) P-value Cases586 (30.5) 960 (49.9)378 (19.6) x Controls676 (23.0)1,431 (48.7)829 (28.2) Heterozygote Odds Ratio = 1.47 Homozygote Odds Ratio = 1.90 Samani N et al, N Engl J Med 2007; 357:

-Log 10 P Values for SNP Associations with Myocardial Infarction Samani N et al, N Engl J Med 2007; 357:

Genome-Wide Scan for Type 2 Diabetes in a Scandinavian Cohort

Linear regression of inverse normalized levels against number of alleles Additive model Sex, age, age 2 as covariates GWA Study of Serum Uric Acid Levels Li S et al, PLoS Genet 2007; 3:e194.

Association of rs and Uric Acid Levels Li S et al, PLoS Genet 2007; 3:e194. Genotype Means (mg/dl) CohortAdditive EffectAAAGGG SardiNIA (1.51)4.48 (1.59)4.02 (1.63) InCHIANTI (1.44)4.94 (1.31)4.33 (1.37)

Association Methods for Quantitative Traits Linear regression of multivariable adjusted residual against number of alleles (Kathiresan,Nat Genet 2008; 40:189-97) Linear regression of log transformed or centralized BMI against genotype (Frayling, Science 2007; 316:889-94) Variance components based Z-score analysis of quantile normalized height (Sanna, Nat Genet 2008; 40: )

Ways of Dealing with Multiple Testing Control family wise error rate (FWER): Bonferroni (α’ = α/n) or Sĭdák (α’ = 1- [1- α] 1/n ) False discovery rate: proportion of significant associations that are actually false positives False positive report probability: probability that the null hypothesis is true, given a statistically significant finding Bayes factors analysis: avoids need for assessing genome-wide error rates but must identify reasonable alternative model Hogart CJ et al, Genet Epidemiol 2008; 32:

Larson, G. The Complete Far Side

Quality Control of SNP Genotyping: Samples Identity with forensic markers (Identifiler) Blind duplicates Gender checks Cryptic relatedness or unsuspected twinning Degradation/fragmentation Call rate (> 80-90%) Heterozygosity: outliers Plate/batch calling effects Chanock et al, Nature 2007; Manolio et al Nat Genet 2007

Quality Control of SNP Genotyping: SNPs Duplicate concordance (CEPH samples) Mendelian errors (typically < 1) Hardy-Weinberg errors (often > ) Heterozygosity (outliers) Call rate (typically > 98%) Minor allele frequency (often > 1%) Validation of most critical results on independent genotyping platform Chanock et al, Nature 2007; Manolio et al Nat Genet 2007

Hardy-Weinberg Equilibrium Occurrence of two alleles of a SNP in the same individual are two independent events Ideal conditions: –random mating - no selection (equal survival) –no migration - no mutation –no inbreeding - large population sizes –gene frequencies equal in males and females)… If alleles A and a of SNP rs1234 have frequencies p and 1-p, expected frequencies of the three genotypes are: After G. Thomas, NCI Freq AA = p 2 Freq Aa = 2p(1-p)Freq aa = (1-p) 2

MetricPerlegenAffymetrix/Broad Number of SNPs480,744439,249 Coverage Single Marker Multi- Marker Single Marker Multi- Marker CEU CHB + JPT YRI Average call rate98.9%99.3% Concordance Homozygous genotypes 99.8%99.9% Heterozygous genotypes 99.8% Coverage, Call Rates, and Concordance of Perlegen and Affymetrix Platforms on HapMap Phase II GAIN Collaborative Group, Nat Genet 2007; 39:

Metric5.0% fail6.0% fail Total Samples1,829--2,289-- Passing QC1, , > 98% call rate1, , Sample and SNP QC Metrics for Affymetrix 5.0 and 6.0 Platforms in GAIN Courtesy, J Paschall, NCBI

Metric5.0% fail6.0% fail Total Samples1,829--2,289-- Passing QC1, , > 98% call rate1, , Total SNPs457, ,660-- Passing QC429, , MAF > 1%457, , > 98% call rate419, , > 95% call rate439, , HWE < , , < 1 Mendel error417, , < 1 Duplicate error454, , Sample and SNP QC Metrics for Affymetrix 5.0 and 6.0 Platforms in GAIN Courtesy, J Paschall, NCBI

Sample Heterozygosity in GAIN Courtesy, J Paschall, NCBI

Sample Heterozygosity in GAIN Courtesy, J Paschall, NCBI

Signal Intensity Plots for rs in AREDS

Signal Intensity Plots for rs in AREDS

Signal Intensity Plots for rs in AREDS

Signal Intensity Plots for rs in AREDS

Signal Intensity Plots for CD44 SNP rs Clayton DG et al, Nat Genet 2005; 37:

Courtesy, G. Thomas, NCI Principal Component Analysis of Structured Population: First to Third Components

Courtesy, G. Thomas, NCI Principal Component Analysis of Structured Population: Fourth and Fifth Components

Courtesy, G. Thomas, NCI Influence of Relatedness on Principal Component Analysis

Courtesy, G. Thomas, NCI Principal Component Analysis of Structured Population: Fourth and Fifth Components

Courtesy, G. Thomas, NCI Principal Component Analysis of Structured Population: Fourth and Fifth Components

Summary Points: Genotyping Quality Control Sample checks for identity, gender error, cryptic relatedness Sample handling differences can introduce artifacts but probably can be adjusted for Association analysis is often quickest way to find genotyping errors Low MAF SNPs are most difficult to call Inspection of genotyping cluster plots is crucial!

Easton D et al, Nature 2007; 447: Quantile-Quantile Plot for Test Statistics, 390 Breast Cancer Cases, 364 Controls 205,586 SNPs λ = 1.03

Easton D et al, Nature 2007; 447: Observed and Expected Associations after Stage 2 of Breast Cancer GWA SignificanceObserved Observed Adjusted ExpectedRatio ,2391, – – – < All p < 0.051,9561,7921,

Q-Q Plot for Multiple Sclerosis; Effect of MHC Hafler D et al, N Engl J Med 2007; 357:

Q-Q Plot for Prostate Cancer, all SNPs Gudmundsson J et al, Nat Genet 2007; 39:

Q-Q Plot for Prostate Cancer, excluding Chromosome 8 Gudmundsson J et al, Nat Genet 2007; 39:

Q-Q Plot for Myocardial Infarction Samani N et al, N Engl J Med 2007; 357: Expected chi-squared statistic Observed chi-squared statistic

-Log 10 P Values for SNP Associations with Myocardial Infarction Samani N et al, N Engl J Med 2007; 357:

-Log 10 P Values for SNP Associations with Myocardial Infarction Samani N et al, N Engl J Med 2007; 357:

SNP Associations with 1,928 MI Cases and 2,938 Controls from UK Samani N et al, N Engl J Med 2007; 357:

Association Signal for Coronary Artery Disease on Chromosome 9 ’3049 Samani N et al, N Engl J Med 2007; 357:

Winner’s Curse: Odds Ratios for CHD Associated with LTA Genotypes in Multiple Studies Clarke et al, PLoS Genet 2006; 2:e107.

Genome-Wide Scan for Alzheimer’s Disease in 861 Cases and 550 Controls Reiman E et al, Neuron 2007; 54:

Genome-Wide Scan for Alzheimer’s Disease in ApoE*e4Carriers Reiman E et al, Neuron 2007; 54:

LOAD Odds Ratios Associated with rs GG by APOE*e4 Status APOE*e4 Group APOE*e4 OR [95% CI] rs OR [95%CI] APOE*e [0.82,1.53] APOE*e [1.90,4.36] All6.07 [ ]1.34 [1.06,1.70] Reiman et al, Neuron 2007; 54:

Klein et al, Science 2005; 308: P Values of GWA Scan for Age-Related Macular Degeneration

Klein et al, Science 2005; 308: Odds Ratios and Population Attributable Risks for AMD Attribute (SNP) rs (C/G) rs (C/T) Risk alleleCC Allelic association χ 2 P value4.1 x 10 –8 1.4 x 10 –6 Odds ratio (dominant)4.6 [2.0-11]4.7 [1.0-22] Frequency in HapMap CEU Population Attributable Risk70% [42-84%]80% [0-96%] Odds ratio (recessive)7.4 [2.9-19]6.2 [2.9-13] Frequency in HapMap CEU Population Attributable Risk46% [31-57%]61% [43-73%]

Risk of Developing AMD by CFH Y402H and Modifiable Risk Factors Schaumberg DA et al, Arch Ophthalmol 2007; 125: Risk Factor CFH Y402H Genotype YYYHHH BMI < 30 kg/m [ ] 3.96 [ ] BMI > 30 kg/m [ ] 2.19 [ ] [ ] Non-smoker [ ] 4.23 [ ] Current smoker 2.34 [ ] 3.20 [ ] 8.69 [ ]

TT CC CT Ordovas et al, Circulation 2002; 106: Interaction: Is LIPC Genotype Related to HDL-C? TT CC CT

Inverse Relation between Endotoxin Exposure and Allergic Sensitization by CD14 Genotype Simpson A et al, Am J Respir Crit Care Med 2006;174:

Challenges in Studying Gene-Environment Interactions ChallengeGenesEnvironment Ease of measurePretty easyOften hard Variability over timeLow/noneHigh Recall biasNonePossible Temporal relation to disease EasyHard

Larson, G. The Complete Far Side