Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical methods for genetic association studies

Similar presentations

Presentation on theme: "Statistical methods for genetic association studies"— Presentation transcript:

1 Statistical methods for genetic association studies

2 A tutorial on statistical methods for population association studies
David Balding Nature Reviews Genetics (2006) 7: This talk is based on a review by David Balding from Imperial College London. It covers only one kind of association study, a population based

3 ? Genetics G×E interaction Environment Health outcome or
We want to know why two people with the same environmental exposure differ in their susceptibility to disease. Partic common complex diseases, heart disease, diabetes, etc. California Cholesterol levels 50-90%, Scandinavia Mortality due to heart disease 50-60%. So we look at the DNA. We might be able to genotype subjects for one strong candidate mutation, but usually we will have little or no idea what’s going on. This is the approach I’m going to talk about today.

4 Recombination X/x: unobserved causative mutation A/a: distant marker
B/b: linked marker A X a x Gametophytes (gamete-producing cells) Gametes Recombination B b To understand assoc crucial to understand the process of recombination. If you look in any almost cell of your body you’ll find two sets of chromosomes, 23 from each parent. When we produce our own germ cells, sperm or eggs, each cell has just one copy. Process involves recomb. Crucial because it breaks down statistical association between markers.

5 Approaches to finding disease genes
Population-based association study “unrelated” subjects Family-based association study nuclear families Admixture mapping recently admixed population Linkage mapping large pedigrees Darvasi & Shifman (2005) Nature Genetics

6 Types of population association study
Candidate causative polymorphism SNP (single nucleotide polymorphism), deletion, duplication Candidate causative gene (5-50 marker SNPs) evidence from linkage study or function Candidate causative region (100s of marker SNPs) evidence from linkage study Genome-wide (>300,000 marker SNPs) no prior evidence required

7 Common disease common variant (CDCV) hypothesis

8 Preliminary analysis: data quality
Assuming mating is random and the population is large, HWE genotype frequencies will apply Allele frequencies: P(X) = p P(x) = q HWE genotype frequencies: P(XX) = p2 P(Xx) = 2pq P(xx) = q2 Useful data quality check: chi-squared or exact test log QQ plot But can discard causative mutations p q p2 pq q2

9 Log QQ plot

10 Preliminary analysis: dealing with missing data
Imputation various methods: maximum likelihood; probalistic; ‘hot-deck’; regression modelling test for independence of ‘missingness’ and case-control status

11 Choice of inheritance model
Snapdragons Antirrhinum majus

12 Choice of inheritance model
Snapdragons Antirrhinum majus

13 Choice of inheritance model
Snapdragons Antirrhinum majus

14 Tests of association: single SNP
Case-control Treat genotype as factor with 3 levels, perform 2x3 goodness-of-fit test. Loses power if effect is additive Count alleles rather than individuals, perform 2x2 goodness-of-fit test. Out of favour because sensitive to deviation from HWE risk estimates not interpretable Major allele homozygote (0) Heterozygote (1) Minor allele homozygote (2) Case Control

15 Tests of association: single SNP
Case-control Cochran-Armitage test loses power if additivity assumption wrong For complex traits additivity often thought to be a good model Cochran-Armitage test

16 Tests of association: single SNP
Case-control Armitage or goodness-of-fit? Depends on: Prior knowledge of inheritance (additive, dominant, etc) Genotype frequencies, e.g. use Armitage test when minor allele is rare, goodness-of-fit test otherwise For complex traits additivity often thought to be a good model

17 Tests of association: single SNP
Case-control Logistic regression Easily incorporates inheritance model (additive, dominant, etc) But assumes phenotype is outcome variable not genotype, so easier to justify for prospective studies For complex traits additivity often thought to be a good model

18 Tests of association: single SNP
Continuous outcome Linear regression Ordered categorical outcomes Multinomial regression But must be normal and equal variance

19 Problems: population stratification

20 Correcting for population stratification
Genomic control Genotype null SNPs and use to calculate background inflation in test statistic due to population stratification Limited to simple single-SNP analyses Can over- or under-correct Other approaches using null SNPs Regression, principal components analysis, model underlying demography

21 Problems: multiple testing
Bonferroni correction conservative when SNPs are linked Permutation computationally demanding False discovery rate Bayesian approaches

22 Tests of association: multiple SNPs
Advantages Many SNPs may be linked to a gene, but individually may not have a significant effect Interactions between SNPs can be modelled ‘Tag’ SNPs can reduce testing of redundant linked SNPs Methods Linear regression, logistic regression Armitage test Haplotype-based methods Natural interpretation But power reduced due to multiple alleles

23 Haplotypes Nature Genetics  37, (2005)

24 Crucially, any stretch of recombining DNA can be divided into regions of high LD (haplotypes), and the history of this haplotype can be represented as a tree. Tag SNPs times fewer loci.

25 Inferring haplotype phase

26 Inferring haplotype phase

27 Inferring haplotype phase

28 Inferring haplotype phase

29 Inferring haplotype phase

30 Inferring haplotype phase
Phase cases and controls separately or pooled? Separating can give inflated type I error Pooling can reduce power

Download ppt "Statistical methods for genetic association studies"

Similar presentations

Ads by Google