Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical methods for genetic association studies

Similar presentations

Presentation on theme: "Statistical methods for genetic association studies"— Presentation transcript:

1 Statistical methods for genetic association studies

2 A tutorial on statistical methods for population association studies David Balding Nature Reviews Genetics (2006) 7:781-791

3 Environment G×E interaction GeneticsHealth outcome or ?

4 Recombination AX ax Gametophytes (gamete- producing cells) Gametes a X A x Recombination B B b b X/x: unobserved causative mutation A/a: distant marker B/b: linked marker

5 Approaches to finding disease genes Population-based association study –unrelated subjects Family-based association study –nuclear families Admixture mapping –recently admixed population Linkage mapping –large pedigrees Darvasi & Shifman (2005) Nature Genetics

6 Types of population association study Candidate causative polymorphism –SNP (single nucleotide polymorphism), deletion, duplication Candidate causative gene (5-50 marker SNPs) –evidence from linkage study or function Candidate causative region (100s of marker SNPs) –evidence from linkage study Genome-wide (>300,000 marker SNPs) –no prior evidence required

7 Common disease common variant (CDCV) hypothesis

8 Assuming mating is random and the population is large, HWE genotype frequencies will apply Allele frequencies: P(X) = p P(x) = q HWE genotype frequencies: P(XX) = p 2 P(Xx) = 2pq P(xx) = q 2 Useful data quality check: –chi-squared or exact test –log QQ plot But can discard causative mutations pq pp2p2 pq q q2q2 Preliminary analysis: data quality

9 Log QQ plot

10 Preliminary analysis: dealing with missing data Imputation –various methods: maximum likelihood; probalistic; hot-deck; regression modelling –test for independence of missingness and case- control status

11 Choice of inheritance model



14 Tests of association: single SNP Case-control –Treat genotype as factor with 3 levels, perform 2x3 goodness-of- fit test. Loses power if effect is additive –Count alleles rather than individuals, perform 2x2 goodness-of-fit test. Out of favour because sensitive to deviation from HWE risk estimates not interpretable Major allele homozygote (0) Heterozygote (1)Minor allele homozygote (2) Case Control

15 Tests of association: single SNP Case-control –Cochran-Armitage test loses power if additivity assumption wrong Cochran-Armitage test

16 Tests of association: single SNP Case-control –Armitage or goodness-of-fit? Depends on: Prior knowledge of inheritance (additive, dominant, etc) Genotype frequencies, e.g. use Armitage test when minor allele is rare, goodness-of-fit test otherwise

17 Tests of association: single SNP Case-control –Logistic regression Easily incorporates inheritance model (additive, dominant, etc) But assumes phenotype is outcome variable not genotype, so easier to justify for prospective studies

18 Tests of association: single SNP Continuous outcome –Linear regression Ordered categorical outcomes –Multinomial regression

19 Problems: population stratification Cases

20 Correcting for population stratification Genomic control –Genotype null SNPs and use to calculate background inflation in test statistic due to population stratification –Limited to simple single-SNP analyses –Can over- or under-correct Other approaches using null SNPs –Regression, principal components analysis, model underlying demography

21 Problems: multiple testing Bonferroni correction –conservative when SNPs are linked Permutation –computationally demanding False discovery rate Bayesian approaches

22 Advantages –Many SNPs may be linked to a gene, but individually may not have a significant effect –Interactions between SNPs can be modelled –Tag SNPs can reduce testing of redundant linked SNPs Methods –Linear regression, logistic regression –Armitage test Haplotype-based methods –Natural interpretation –But power reduced due to multiple alleles Tests of association: multiple SNPs

23 Haplotypes Nature Genetics 37, 915 - 916 (2005)


25 Inferring haplotype phase

26 ?




30 Phase cases and controls separately or pooled? –Separating can give inflated type I error –Pooling can reduce power Inferring haplotype phase

Download ppt "Statistical methods for genetic association studies"

Similar presentations

Ads by Google