Imputation for GWAS 6 December 2012.

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

Analysis of imputed rare variants
What is an association study? Define linkage disequilibrium
Why I chose: First reading results seemed counterintuitive Introduction full of references I didn’t know Useful? Or Gee Whizz so what?...Needed to read.
Why this paper Causal genetic variants at loci contributing to complex phenotypes unknown Rat/mice model organisms in physiology and diseases Relevant.
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Julia Krushkal 4/11/2017 The International HapMap Project: A Rich Resource of Genetic Information Julia Krushkal Lecture in Bioinformatics 04/15/2010.
Presented by Qing Duan Dr. Yun Li group UNC at Chapel Hill
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Ferdinand van ’t Hooft Cardiovascular Genetics and Genomics Group Karolinska Institutet, Stockholm, Sweden Genome-Wide Association Study GWAS
From sequence data to genomic prediction
MALD Mapping by Admixture Linkage Disequilibrium.
University of Connecticut
Lessons learnt from the 1000 Genomes Project about sequencing in populations Gil McVean Wellcome Trust Centre for Human Genetics and Department of Statistics,
The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.
Genotype Error Detection using Hidden Markov Models of Haplotype Diversity Ion Mandoiu CSE Department, University of Connecticut Joint work with Justin.
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson NIEHS SNPs Workshop.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,
Robust and powerful sibpair test for rare variant association
HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium.
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
A single-nucleotide polymorphism tagging set for human drug metabolism and transport Kourosh R Ahmadi, Mike E Weale, Zhengyu Y Xue, Nicole Soranzo, David.
An Efficient Method of Generating Whole Genome Sequence for Thousands of Bulls Chuanyu Sun 1 and Paul M. VanRaden 2 1 National Association of Animal Breeders,
Imputation 2 Presenter: Ka-Kit Lam.
Informative SNP Selection Based on Multiple Linear Regression
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
Genome-Wide Association Study (GWAS)
The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.
Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology.
Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida.
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
California Pacific Medical Center
The International Consortium. The International HapMap Project.
Imputation-based local ancestry inference in admixed populations
Copyright OpenHelix. No use or reproduction without express written consent1.
Biostatistics-Lecture 19 Linkage Disequilibrium and SNP detection
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Variant calling: number of individuals vs. depth of read coverage Gabor T. Marth Boston College Biology Department 1000 Genomes Meeting Cold Spring Harbor.
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
Analysis of Next Generation Sequence Data BIOST /06/2015.
Canadian Bioinformatics Workshops
Genome-Wides Association Studies (GWAS) Veryan Codd.
WHI Imputation. Target GWAS data WHIMS +, ~5,000-6,000 samples, Illumina Omni express GRANET, ~5,000 samples, Illumina Omni Hipfx, ~4,000-5,000 samples,
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Understanding human admixture, and association mapping in admixed populations. Simon Myers.
From Reads to Results Exome-seq analysis at CCBR
Imputation Sarah Medland Boulder 2015.
Common variation, GWAS & PLINK
Gil McVean Department of Statistics
Constrained Hidden Markov Models for Population-based Haplotyping
Imputation-based local ancestry inference in admixed populations
Post-GWAS and Mechanistic Analyses
Sequencing the IL4 locus in African Americans implicates rare noncoding variants in asthma susceptibility  Gabe Haller, BA, Dara G. Torgerson, PhD, Carole.
Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association Studies 
Genotype Imputation with Millions of Reference Samples
Perspectives from Human Studies and Low Density Chip
A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals  Brian L. Browning, Sharon.
Yu Zhang, Tianhua Niu, Jun S. Liu 
Evaluating the Effects of Imputation on the Power, Coverage, and Cost Efficiency of Genome-wide SNP Platforms  Carl A. Anderson, Fredrik H. Pettersson,
Genotype-Imputation Accuracy across Worldwide Human Populations
Presentation transcript:

Imputation for GWAS 6 December 2012

Introduction Imputation describes the process of predicting genotypes that have not been directly typed in a sample of individuals: missing genotypes at typed variants; genotypes at un-typed variants that are present in an external high-density “reference panel” of phased haplotypes. In silico genotypes can be tested for association within standard generalised linear regression framework.

How does imputation work?

What is the purpose of imputation? Increased power. The reference panel is more likely to contain the causal variant (or a better tag) than a GWAS array. Fine-mapping. Imputation provides a high-resolution overview of an association signal across a locus. Meta-analysis. Imputation allows GWAS typed with different arrays to be combined up to variants in the reference panel.

Increased power and improved fine-mapping resolution

IMPUTEv2 and minimac Pre-phasing. Estimate haplotypes at variants typed in the study sample (scaffold). Haploid imputation. Study sample haplotypes are considered an unknown path through haplotypes from the reference panel. Hidden Markov model (HMM). Switch probability between reference haplotypes depends on recombination rate. Allelic mismatch between reference and observed haplotypes can be incorporated by allowing for low rate of mutation. Less computationally demanding than diploid imputation that attempts to jointly phase and impute simultaneously (IMPUTEv1 and MaCH).

Reference panels Large-scale genotyping and re-sequencing reference panels made available through HapMap Consortium and 1000 Genomes Project. HapMap2. 60 CEU, 60 YRI and 90 CHB/JPT individuals typed for ~3M variants. HapMap3. 1011 individuals from multiple ethnic groups typed for ~1.6M variants. 1000 Genomes. Most recent release includes 1094 individuals from multiple ethnic groups typed for ~30M variants (including indels).

Choice of reference panel Imputation software designed for use with 1000 Genomes reference panels, but remain computationally demanding. Making use of the “all ancestries” reference panel (rather than ethnic-specific reference panel) improves imputation accuracy for rare variants. Formatted reference panels for IMPUTEv2 and minimac can be downloaded from the software websites.

Factors affecting imputation accuracy Scaffold. Number of individuals and GWAS array used for genotyping (coverage of variation). Reference panel. Number of individuals and density of typing. Similarity of ancestry with study sample. Minor allele frequency. Pre-phasing or diploid imputation (minimal).

Imputation accuracy

Imputation quality control Pre-imputation. Essential that GWAS scaffold excludes poor quality variants. Common to exclude MAF<1% variants. Post imputation. Imputation quality assessed by “information measures” in range 0-1. Information measure α in a scaffold of N individuals has equivalent power to αN perfectly genotyped individuals. Typical to filter SNPs by α (exclude <0.8, <0.4). IMPUTEv2 “info score” and minimac ȓ2. In loci identified through imputation, important to check quality of typed SNPs in the scaffold in the region by visual inspection of cluster plots.

Analysis of imputed genotypes For each individual, imputation provides probability distribution of possible genotypes at each un-typed variant from the reference panel. Using best guess genotype, or filtering on probability of best guess genotype can increase false positives and reduce power. Convert probabilities to “expected allele count”, i.e. p1+2p2. Fully take account of the uncertainty in the imputation in a “missing data likelihood”. Software: SNPTEST2 (for IMPUTEv2) and Mach2Dat (for minimac).

Rare variants and complex disease Rare variants are likely to have arisen from founder effects in the last few generations. Rare variants are expected to have larger effects on complex traits that common variants. Statistical methods focus on the accumulation of minor alleles at rare variants (mutational load) within the same functional unit.

GRANVIL Test of association of phenotype with proportion of rare variants at which individuals carry minor alleles. Model disease phenotype via regression on pi and any other covariates in GLM framework. 1 0 0 0 0 1 0 0 0 1 pi = 3/10 Reedik Magi http://www.well.ox.ac.uk/GRANVIL/

Assaying rare genetic variation Gold-standard approach to assaying rare genetic variation is through re-sequencing, which is expensive on the scale of the whole genome. GWAS genotyping arrays are inexpensive, but are not designed to capture rare genetic variation. Increasing availability of large-scale reference panels of whole-genome re-sequencing data: 1000 Genomes Project and the UK10K Project. Impute into GWAS scaffolds up to these reference panels to recover genotypes at rare variants at no additional cost, other than computing.

GRANVIL: imputed variants Test of association of phenotype with proportion of rare variants at which individuals carry minor alleles. Replace direct genotypes with posterior probability of heterozygous or rare homozygous call from imputation. Model disease phenotype via regression on pi and any other covariates in GLM framework. 0.9 0.1 0.2 0.1 0.1 0.8 0.1 0.1 0.1 0.6 pi = 3.0/10

Application to WTCCC GWAS of seven complex human diseases from the UK (2000 cases each and 3000 shared controls from 1958 British Birth Cohort and National Blood Service): bipolar disease (BD), coronary artery disease (CAD), Crohn’s disease (CD), hypertension (HT), rheumatoid arthritis (RA), type 1 diabetes (T1D) and type 2 diabetes (T2D). Individuals genotyped using the Affymetrix GeneChip 500K Mapping Array Set. After quality control, 16,179 samples and 391,060 autosomal SNPs (MAF>1%) carried forward for analysis.

Fine-scale UK population structure Fine-scale population structure may have greater impact on rare variants than on common SNPs because of recent founder effects. Utilised EIGENSTRAT to construct principal components to represent axes of genetic variation across the UK: 27,770 high-quality LD pruned (r2<0.2) common autosomal SNPs (MAF>5%).

Fine-scale UK population structure

Imputation SNPs mapped to NCBI build 37 of human genome. Samples imputed up to 1000 Genomes Phase 1 cosmopolitan reference panel (June 2011 interim release). 8.23M imputed autosomal rare variants (MAF<1%) polymorphic in WTCCC. 5.38M (65.3%) were “well-imputed” (i.e. Info score > 0.4) and carried forward for analysis. Mean info score was 0.618, and 17.3% had info score > 0.8.

Rare variant analysis Test for association of each disease with accumulation of rare variants (MAF<1%) within genes using GRANVIL. Gene boundaries defined from UCSC human genome database (build 37). Analyses adjusted for three principal components to adjust for fine-scale UK population structure. Genome-wide significance threshold p<1.7x10-6: Bonferroni adjustment for 30,000 genes.

No evidence of residual population structure

Rare variant association with T1D Genome-wide significant evidence of association of T1D with rare variants in multiple genes from the MHC. Strongest signal of association observed for HLA-DRA (p=2.0x10-13). Gene contains 23 well imputed rare variants with mean MAF of 0.32%. Accumulations of minor alleles across these variants were associated with decreased risk of disease: odds ratio 0.556 (0.476-0.650) per minor allele.

T1D association across the MHC Ten genes achieve genome-wide significant evidence of rare variant association with T1D. HLA-DRA SLC44A4 HLA-DRB5 PBX2 TNXA PBMUCL2 EHMT2 AGPAT1 C6orf10 NCR3

T1D association across the MHC After additional adjustment for additive effect of lead GWAS common variant from the MHC (rs9268645). PBX2 HLA-DRA HLA-DRB5 SLC44A4 SKIVL2 HLA-DMA PBMUCL2 EHMT2 AGPAT1 TNXB

T1D association across the MHC

Comments GRANVIL assumes the same direction of effect on the trait of all rare variants within the functional unit. Methodology allowing for different directions of effect of rare variants are well established for re-sequencing data, and are being generalised to allow for imputation. The most powerful rare variant test will depend on the underlying genetic architecture of the trait.