Download presentation

Presentation is loading. Please wait.

Published byAudrey Holmes Modified over 2 years ago

1
Statistical methods for genetic association studies

2
A tutorial on statistical methods for population association studies David Balding Nature Reviews Genetics (2006) 7:

3
Environment G×E interaction GeneticsHealth outcome or ?

4
Recombination AX ax Gametophytes (gamete- producing cells) Gametes a X A x Recombination B B b b X/x: unobserved causative mutation A/a: distant marker B/b: linked marker

5
Approaches to finding disease genes Population-based association study –unrelated subjects Family-based association study –nuclear families Admixture mapping –recently admixed population Linkage mapping –large pedigrees Darvasi & Shifman (2005) Nature Genetics

6
Types of population association study Candidate causative polymorphism –SNP (single nucleotide polymorphism), deletion, duplication Candidate causative gene (5-50 marker SNPs) –evidence from linkage study or function Candidate causative region (100s of marker SNPs) –evidence from linkage study Genome-wide (>300,000 marker SNPs) –no prior evidence required

7
Common disease common variant (CDCV) hypothesis

8
Assuming mating is random and the population is large, HWE genotype frequencies will apply Allele frequencies: P(X) = p P(x) = q HWE genotype frequencies: P(XX) = p 2 P(Xx) = 2pq P(xx) = q 2 Useful data quality check: –chi-squared or exact test –log QQ plot But can discard causative mutations pq pp2p2 pq q q2q2 Preliminary analysis: data quality

9
Log QQ plot

10
Preliminary analysis: dealing with missing data Imputation –various methods: maximum likelihood; probalistic; hot-deck; regression modelling –test for independence of missingness and case- control status

11
Choice of inheritance model

12

13

14
Tests of association: single SNP Case-control –Treat genotype as factor with 3 levels, perform 2x3 goodness-of- fit test. Loses power if effect is additive –Count alleles rather than individuals, perform 2x2 goodness-of-fit test. Out of favour because sensitive to deviation from HWE risk estimates not interpretable Major allele homozygote (0) Heterozygote (1)Minor allele homozygote (2) Case Control

15
Tests of association: single SNP Case-control –Cochran-Armitage test loses power if additivity assumption wrong Cochran-Armitage test

16
Tests of association: single SNP Case-control –Armitage or goodness-of-fit? Depends on: Prior knowledge of inheritance (additive, dominant, etc) Genotype frequencies, e.g. use Armitage test when minor allele is rare, goodness-of-fit test otherwise

17
Tests of association: single SNP Case-control –Logistic regression Easily incorporates inheritance model (additive, dominant, etc) But assumes phenotype is outcome variable not genotype, so easier to justify for prospective studies

18
Tests of association: single SNP Continuous outcome –Linear regression Ordered categorical outcomes –Multinomial regression

19
Problems: population stratification Cases

20
Correcting for population stratification Genomic control –Genotype null SNPs and use to calculate background inflation in test statistic due to population stratification –Limited to simple single-SNP analyses –Can over- or under-correct Other approaches using null SNPs –Regression, principal components analysis, model underlying demography

21
Problems: multiple testing Bonferroni correction –conservative when SNPs are linked Permutation –computationally demanding False discovery rate Bayesian approaches

22
Advantages –Many SNPs may be linked to a gene, but individually may not have a significant effect –Interactions between SNPs can be modelled –Tag SNPs can reduce testing of redundant linked SNPs Methods –Linear regression, logistic regression –Armitage test Haplotype-based methods –Natural interpretation –But power reduced due to multiple alleles Tests of association: multiple SNPs

23
Haplotypes Nature Genetics 37, (2005)

24

25
Inferring haplotype phase

26
?

27

28

29
Methods & software PHASE, FASTPHASE EH+ FBAT HAPLOTYPER EM-DECODER PLEM HAP HAPLORE Haplo.stat SNPEM PEDPHASE SNPHAP TDTHAP Inferring haplotype phase

30
Phase cases and controls separately or pooled? –Separating can give inflated type I error –Pooling can reduce power Inferring haplotype phase

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google