Statistical methods for genetic association studies

Name: Statistical methods for genetic association studies
Uploaded: 2017-07-25T01:08:25+00:00
Duration: PTM9S25
Channel: Audrey Holmes
Description: Statistical methods for genetic association studies

Statistical methods for genetic association studies

A tutorial on statistical methods for population association studies
David Balding Nature Reviews Genetics (2006) 7: This talk is based on a review by David Balding from Imperial College London. It covers only one kind of association study, a population based

? Genetics G×E interaction Environment Health outcome or
We want to know why two people with the same environmental exposure differ in their susceptibility to disease. Partic common complex diseases, heart disease, diabetes, etc. California Cholesterol levels 50-90%, Scandinavia Mortality due to heart disease 50-60%. So we look at the DNA. We might be able to genotype subjects for one strong candidate mutation, but usually we will have little or no idea what’s going on. This is the approach I’m going to talk about today.

Recombination X/x: unobserved causative mutation A/a: distant marker
B/b: linked marker A X a x Gametophytes (gamete-producing cells) Gametes Recombination B b To understand assoc crucial to understand the process of recombination. If you look in any almost cell of your body you’ll find two sets of chromosomes, 23 from each parent. When we produce our own germ cells, sperm or eggs, each cell has just one copy. Process involves recomb. Crucial because it breaks down statistical association between markers.

Approaches to finding disease genes
Population-based association study “unrelated” subjects Family-based association study nuclear families Admixture mapping recently admixed population Linkage mapping large pedigrees Darvasi & Shifman (2005) Nature Genetics

Types of population association study
Candidate causative polymorphism SNP (single nucleotide polymorphism), deletion, duplication Candidate causative gene (5-50 marker SNPs) evidence from linkage study or function Candidate causative region (100s of marker SNPs) evidence from linkage study Genome-wide (>300,000 marker SNPs) no prior evidence required

Common disease common variant (CDCV) hypothesis

Preliminary analysis: data quality
Assuming mating is random and the population is large, HWE genotype frequencies will apply Allele frequencies: P(X) = p P(x) = q HWE genotype frequencies: P(XX) = p2 P(Xx) = 2pq P(xx) = q2 Useful data quality check: chi-squared or exact test log QQ plot But can discard causative mutations p q p2 pq q2

Log QQ plot

Preliminary analysis: dealing with missing data
Imputation various methods: maximum likelihood; probalistic; ‘hot-deck’; regression modelling test for independence of ‘missingness’ and case-control status

Choice of inheritance model
Snapdragons Antirrhinum majus

Tests of association: single SNP
Case-control Treat genotype as factor with 3 levels, perform 2x3 goodness-of-fit test. Loses power if effect is additive Count alleles rather than individuals, perform 2x2 goodness-of-fit test. Out of favour because sensitive to deviation from HWE risk estimates not interpretable Major allele homozygote (0) Heterozygote (1) Minor allele homozygote (2) Case Control

Case-control Cochran-Armitage test loses power if additivity assumption wrong For complex traits additivity often thought to be a good model Cochran-Armitage test

Case-control Armitage or goodness-of-fit? Depends on: Prior knowledge of inheritance (additive, dominant, etc) Genotype frequencies, e.g. use Armitage test when minor allele is rare, goodness-of-fit test otherwise For complex traits additivity often thought to be a good model

Case-control Logistic regression Easily incorporates inheritance model (additive, dominant, etc) But assumes phenotype is outcome variable not genotype, so easier to justify for prospective studies For complex traits additivity often thought to be a good model

Continuous outcome Linear regression Ordered categorical outcomes Multinomial regression But must be normal and equal variance

Problems: population stratification
Cases

Correcting for population stratification
Genomic control Genotype null SNPs and use to calculate background inflation in test statistic due to population stratification Limited to simple single-SNP analyses Can over- or under-correct Other approaches using null SNPs Regression, principal components analysis, model underlying demography

Problems: multiple testing
Bonferroni correction conservative when SNPs are linked Permutation computationally demanding False discovery rate Bayesian approaches

Tests of association: multiple SNPs
Advantages Many SNPs may be linked to a gene, but individually may not have a significant effect Interactions between SNPs can be modelled ‘Tag’ SNPs can reduce testing of redundant linked SNPs Methods Linear regression, logistic regression Armitage test Haplotype-based methods Natural interpretation But power reduced due to multiple alleles

Haplotypes Nature Genetics 37, (2005)

Crucially, any stretch of recombining DNA can be divided into regions of high LD (haplotypes), and the history of this haplotype can be represented as a tree. Tag SNPs times fewer loci.

Inferring haplotype phase

?

Methods & software PHASE, FASTPHASE EH+ FBAT HAPLOTYPER EM-DECODER PLEM HAP HAPLORE Haplo.stat SNPEM PEDPHASE SNPHAP TDTHAP

Phase cases and controls separately or pooled? Separating can give inflated type I error Pooling can reduce power

Statistical methods for genetic association studies

Similar presentations

Presentation on theme: "Statistical methods for genetic association studies"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistical methods for genetic association studies

Similar presentations

Presentation on theme: "Statistical methods for genetic association studies"— Presentation transcript:

Similar presentations

About project

Feedback