Presentation on theme: "Statistical methods for genetic association studies"— Presentation transcript:
1 Statistical methods for genetic association studies
2 A tutorial on statistical methods for population association studies David BaldingNature Reviews Genetics (2006) 7:This talk is based on a review by David Balding from Imperial College London. It covers only one kind of association study, a population based
3 ? Genetics G×E interaction Environment Health outcome or We want to know why two people with the same environmental exposure differ in their susceptibility to disease. Partic common complex diseases, heart disease, diabetes, etc. California Cholesterol levels 50-90%, Scandinavia Mortality due to heart disease 50-60%. So we look at the DNA. We might be able to genotype subjects for one strong candidate mutation, but usually we will have little or no idea what’s going on. This is the approach I’m going to talk about today.
4 Recombination X/x: unobserved causative mutation A/a: distant marker B/b: linked markerAXaxGametophytes(gamete-producing cells)GametesRecombinationBbTo understand assoc crucial to understand the process of recombination. If you look in any almost cell of your body you’ll find two sets of chromosomes, 23 from each parent. When we produce our own germ cells, sperm or eggs, each cell has just one copy. Process involves recomb. Crucial because it breaks down statistical association between markers.
5 Approaches to finding disease genes Population-based association study“unrelated” subjectsFamily-based association studynuclear familiesAdmixture mappingrecently admixed populationLinkage mappinglarge pedigreesDarvasi & Shifman (2005) Nature Genetics
6 Types of population association study Candidate causative polymorphismSNP (single nucleotide polymorphism), deletion, duplicationCandidate causative gene (5-50 marker SNPs)evidence from linkage study or functionCandidate causative region (100s of marker SNPs)evidence from linkage studyGenome-wide (>300,000 marker SNPs)no prior evidence required
8 Preliminary analysis: data quality Assuming mating is random and the population is large, HWE genotype frequencies will applyAllele frequencies:P(X) = pP(x) = qHWE genotype frequencies:P(XX) = p2P(Xx) = 2pqP(xx) = q2Useful data quality check:chi-squared or exact testlog QQ plotBut can discard causative mutationspqp2pqq2
10 Preliminary analysis: dealing with missing data Imputationvarious methods: maximum likelihood; probalistic; ‘hot-deck’; regression modellingtest for independence of ‘missingness’ and case-control status
11 Choice of inheritance model SnapdragonsAntirrhinum majus
12 Choice of inheritance model SnapdragonsAntirrhinum majus
13 Choice of inheritance model SnapdragonsAntirrhinum majus
14 Tests of association: single SNP Case-controlTreat genotype as factor with 3 levels, perform 2x3 goodness-of-fit test. Loses power if effect is additiveCount alleles rather than individuals, perform 2x2 goodness-of-fit test. Out of favour becausesensitive to deviation from HWErisk estimates not interpretableMajor allele homozygote (0)Heterozygote (1)Minor allele homozygote (2)CaseControl
15 Tests of association: single SNP Case-controlCochran-Armitage testloses power if additivity assumption wrongFor complex traits additivity often thought to be a good modelCochran-Armitage test
16 Tests of association: single SNP Case-controlArmitage or goodness-of-fit? Depends on:Prior knowledge of inheritance (additive, dominant, etc)Genotype frequencies, e.g. use Armitage test when minor allele is rare, goodness-of-fit test otherwiseFor complex traits additivity often thought to be a good model
17 Tests of association: single SNP Case-controlLogistic regressionEasily incorporates inheritance model (additive, dominant, etc)But assumes phenotype is outcome variable not genotype, so easier to justify for prospective studiesFor complex traits additivity often thought to be a good model
18 Tests of association: single SNP Continuous outcomeLinear regressionOrdered categorical outcomesMultinomial regressionBut must be normal and equal variance
20 Correcting for population stratification Genomic controlGenotype null SNPs and use to calculate background inflation in test statistic due to population stratificationLimited to simple single-SNP analysesCan over- or under-correctOther approaches using null SNPsRegression, principal components analysis, model underlying demography
21 Problems: multiple testing Bonferroni correctionconservative when SNPs are linkedPermutationcomputationally demandingFalse discovery rateBayesian approaches
22 Tests of association: multiple SNPs AdvantagesMany SNPs may be linked to a gene, but individually may not have a significant effectInteractions between SNPs can be modelled‘Tag’ SNPs can reduce testing of redundant linked SNPsMethodsLinear regression, logistic regressionArmitage testHaplotype-based methodsNatural interpretationBut power reduced due to multiple alleles
Your consent to our cookies if you continue to use this website.