Statistical methods for genetic association studies

Slides:



Advertisements
Similar presentations
Generalized Regional Admixture Mapping (RAM) and Structured Association Testing (SAT) David T. Redden, Associate Professor, Department of Biostatistics,
Advertisements

What is an association study? Define linkage disequilibrium
Gene-by-Environment and Meta-Analysis Eleazar Eskin University of California, Los Angeles.
BST 775 Lecture PLINK – A Popular Toolset for GWAS
Association Tests for Rare Variants Using Sequence Data
Genetic Heterogeneity Taken from: Advanced Topics in Linkage Analysis. Ch. 27 Presented by: Natalie Aizenberg Assaf Chen.
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.
Basics of Linkage Analysis
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Chapter 2: Hardy-Weinberg Gene frequency Genotype frequency Gene counting method Square root method Hardy-Weinberg low Sex-linked inheritance Linkage and.
High resolution detection of IBD Sharon R Browning and Brian L Browning Supported by the Marsden Fund.
Joint Linkage and Linkage Disequilibrium Mapping
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
QTL Mapping R. M. Sundaram.
MALD Mapping by Admixture Linkage Disequilibrium.
Lab 13: Association Genetics. Goals Use a Mixed Model to determine genetic associations. Understand the effect of population structure and kinship on.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis Dinu et al, J. Biomedical.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
Population Genetics: Chapter 3 Epidemiology 217 January 16, 2011.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
Conservation of genomic segments (haplotypes): The “HapMap” n In populations, it appears the the linear order of alleles (“haplotype”) is conserved in.
1 Association Analysis of Rare Genetic Variants Qunyuan Zhang Division of Statistical Genomics Course M Computational Statistical Genetics.
Genome-Wide Association Study (GWAS)
Contingency tables Brian Healy, PhD. Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
Genes in human populations n Population genetics: focus on allele frequencies (the “gene pool” = all the gametes in a big pot!) n Hardy-Weinberg calculations.
Lab 13: Association Genetics December 5, Goals Use Mixed Models and General Linear Models to determine genetic associations. Understand the effect.
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
Allele Frequencies: Staying Constant Chapter 14. What is Allele Frequency? How frequent any allele is in a given population: –Within one race –Within.
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Statistical Issues in Genetic Association Studies
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
GenABEL: an R package for Genome Wide Association Analysis
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Admixture Mapping Controlled Crosses Are Often Used to Determine the Genetic Basis of Differences Between Populations. When controlled crosses are not.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Statistical Analysis of Candidate Gene Association Studies (Categorical Traits) of Biallelic Single Nucleotide Polymorphisms Maani Beigy MD-MPH Student.
Genome-Wides Association Studies (GWAS) Veryan Codd.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: How to Interpret a Genome-wide Association Study JAMA.
Power Calculations for GWAS
upstream vs. ORF binding and gene expression?
Genome Wide Association Studies using SNP
Recombination (Crossing Over)
Haplotype Reconstruction
Exercise: Effect of the IL6R gene on IL-6R concentration
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Lecture 9: QTL Mapping II: Outbred Populations
Presentation transcript:

Statistical methods for genetic association studies http://www.stats.gla.ac.uk/~paulj/assoc_study_stats.ppt

A tutorial on statistical methods for population association studies David Balding Nature Reviews Genetics (2006) 7:781-791 This talk is based on a review by David Balding from Imperial College London. It covers only one kind of association study, a population based

? Genetics G×E interaction Environment Health outcome or We want to know why two people with the same environmental exposure differ in their susceptibility to disease. Partic common complex diseases, heart disease, diabetes, etc. California Cholesterol levels 50-90%, Scandinavia Mortality due to heart disease 50-60%. So we look at the DNA. We might be able to genotype subjects for one strong candidate mutation, but usually we will have little or no idea what’s going on. This is the approach I’m going to talk about today.

Recombination X/x: unobserved causative mutation A/a: distant marker B/b: linked marker A X a x Gametophytes (gamete-producing cells) Gametes Recombination B b To understand assoc crucial to understand the process of recombination. If you look in any almost cell of your body you’ll find two sets of chromosomes, 23 from each parent. When we produce our own germ cells, sperm or eggs, each cell has just one copy. Process involves recomb. Crucial because it breaks down statistical association between markers.

Approaches to finding disease genes Population-based association study “unrelated” subjects Family-based association study nuclear families Admixture mapping recently admixed population Linkage mapping large pedigrees Darvasi & Shifman (2005) Nature Genetics

Types of population association study Candidate causative polymorphism SNP (single nucleotide polymorphism), deletion, duplication Candidate causative gene (5-50 marker SNPs) evidence from linkage study or function Candidate causative region (100s of marker SNPs) evidence from linkage study Genome-wide (>300,000 marker SNPs) no prior evidence required

Common disease common variant (CDCV) hypothesis

Preliminary analysis: data quality Assuming mating is random and the population is large, HWE genotype frequencies will apply Allele frequencies: P(X) = p P(x) = q HWE genotype frequencies: P(XX) = p2 P(Xx) = 2pq P(xx) = q2 Useful data quality check: chi-squared or exact test log QQ plot But can discard causative mutations p q p2 pq q2

Log QQ plot

Preliminary analysis: dealing with missing data Imputation various methods: maximum likelihood; probalistic; ‘hot-deck’; regression modelling test for independence of ‘missingness’ and case-control status

Choice of inheritance model Snapdragons Antirrhinum majus

Choice of inheritance model Snapdragons Antirrhinum majus

Choice of inheritance model Snapdragons Antirrhinum majus

Tests of association: single SNP Case-control Treat genotype as factor with 3 levels, perform 2x3 goodness-of-fit test. Loses power if effect is additive Count alleles rather than individuals, perform 2x2 goodness-of-fit test. Out of favour because sensitive to deviation from HWE risk estimates not interpretable Major allele homozygote (0) Heterozygote (1) Minor allele homozygote (2) Case Control

Tests of association: single SNP Case-control Cochran-Armitage test loses power if additivity assumption wrong For complex traits additivity often thought to be a good model Cochran-Armitage test

Tests of association: single SNP Case-control Armitage or goodness-of-fit? Depends on: Prior knowledge of inheritance (additive, dominant, etc) Genotype frequencies, e.g. use Armitage test when minor allele is rare, goodness-of-fit test otherwise For complex traits additivity often thought to be a good model

Tests of association: single SNP Case-control Logistic regression Easily incorporates inheritance model (additive, dominant, etc) But assumes phenotype is outcome variable not genotype, so easier to justify for prospective studies For complex traits additivity often thought to be a good model

Tests of association: single SNP Continuous outcome Linear regression Ordered categorical outcomes Multinomial regression But must be normal and equal variance

Problems: population stratification Cases

Correcting for population stratification Genomic control Genotype null SNPs and use to calculate background inflation in test statistic due to population stratification Limited to simple single-SNP analyses Can over- or under-correct Other approaches using null SNPs Regression, principal components analysis, model underlying demography

Problems: multiple testing Bonferroni correction conservative when SNPs are linked Permutation computationally demanding False discovery rate Bayesian approaches

Tests of association: multiple SNPs Advantages Many SNPs may be linked to a gene, but individually may not have a significant effect Interactions between SNPs can be modelled ‘Tag’ SNPs can reduce testing of redundant linked SNPs Methods Linear regression, logistic regression Armitage test Haplotype-based methods Natural interpretation But power reduced due to multiple alleles

Haplotypes Nature Genetics  37, 915 - 916 (2005)

Crucially, any stretch of recombining DNA can be divided into regions of high LD (haplotypes), and the history of this haplotype can be represented as a tree. Tag SNPs. 5-10 times fewer loci.

Inferring haplotype phase

Inferring haplotype phase ?

Inferring haplotype phase

Inferring haplotype phase

Inferring haplotype phase Methods & software PHASE, FASTPHASE EH+ FBAT HAPLOTYPER EM-DECODER PLEM HAP HAPLORE Haplo.stat SNPEM PEDPHASE SNPHAP TDTHAP

Inferring haplotype phase Phase cases and controls separately or pooled? Separating can give inflated type I error Pooling can reduce power