Analysis of imputed rare variants

Slides:



Advertisements
Similar presentations
Sequential Kernel Association Tests for the Combined Effect of Rare and Common Variants Journal club (Nov/13) SH Lee.
Advertisements

The genetic dissection of complex traits
Genetic contributions to complex traits in a post genomewide era Nic Timpson ALSPAC – The first 21 years conference 2012.
Imputation for GWAS 6 December 2012.
Introduction to Genetic Association Studies
What is an association study? Define linkage disequilibrium
Gene-by-Environment and Meta-Analysis Eleazar Eskin University of California, Los Angeles.
Why I chose: First reading results seemed counterintuitive Introduction full of references I didn’t know Useful? Or Gee Whizz so what?...Needed to read.
Why this paper Causal genetic variants at loci contributing to complex phenotypes unknown Rat/mice model organisms in physiology and diseases Relevant.
Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Missing Heritability Lipika Ray 4th June Heritability: Phenotype (P) = genotype (G) + environmental factors (E) (observed) (unobserved) (unobserved)
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Ferdinand van ’t Hooft Cardiovascular Genetics and Genomics Group Karolinska Institutet, Stockholm, Sweden Genome-Wide Association Study GWAS
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
Estimating “Heritability” using Genetic Data David Evans University of Queensland.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Understanding Genetics of Schizophrenia
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
Rare and common variants: twenty arguments G.Gibson Homework 3 Mylène Champs Marine Flechet Mathieu Stifkens 1 Bioinformatics - GBIO K.Van Steen.
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
Geuvadis RNAseq analysis at UNIGE Analysis plans
Population Genetic Hardy-Wienberg Law Genetic drift Inbreeding Genetic Bottleneck Outbreeding Founder event Effective population size Gene flow.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
ConceptS and Connections
1 Association Analysis of Rare Genetic Variants Qunyuan Zhang Division of Statistical Genomics Course M Computational Statistical Genetics.
What host factors are at play? Paul de Bakker Division of Genetics, Brigham and Women’s Hospital Broad Institute of MIT and Harvard
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (1) J. R. O’Connell 1 and P. M. VanRaden 2 1 University of Maryland School of Medicine,
Ch. 20 – Mechanisms of Evolution 20.1 – Population Genetics macro-evolution – evolution on a large scale, such as the evolution of new species from a common.
INTRODUCTION TO ASSOCIATION MAPPING
E XOME SEQUENCING AND COMPLEX DISEASE : practical aspects of rare variant association studies Alice Bouchoms Amaury Vanvinckenroye Maxime Legrand 1.
Qunyuan Zhang Ingrid Borecki, Michael A. Province
Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public.
 A llele frequencies will remain constant unless one or more factors cause the frequencies to change.  If there is no change, there is no evolving.
Introduction to Genetics Biology 1-2. Genetics Genetics is the study of heredity and inheritable traits. Genetics is the study of heredity and inheritable.
Future Directions Pak Sham, HKU Boulder Genetics of Complex Traits Quantitative GeneticsGene Mapping Functional Genomics.
HW2: exome sequencing and complex disease Jacquemin Jonathan de Bournonville Sébastien.
Genome wide association studies (A Brief Start)
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Copyright © 2005 Pearson Education, Inc. publishing as Benjamin Cummings PowerPoint Lectures for Biology, Seventh Edition Neil Campbell and Jane Reece.
Chapter 21 Genetic Variation and Evolution. What is the goal of the Fast Plant Experiment? What are you measuring? What are you comparing?
Miller Syndrome: DHODH
Schematic of the single variant polymorphism (SNP) genotyping assay.
Analysis of Next Generation Sequence Data BIOST /06/2015.
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
STT2073 Plant Breeding and Improvement. Quality vs Quantity Quality: Appearance of fruit/plant/seed – size, colour – flavour, taste, texture – shelflife.
An atlas of genetic influences on human blood metabolites Nature Genetics 2014 Jun;46(6)
Genome-Wides Association Studies (GWAS) Veryan Codd.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Brendan Burke and Kyle Steffen. Important New Tool in Genomic Medicine GWAS is used to estimate disease risk and test SNPs( the most common type of genetic.
May 4, What is an allele?. Genotype: genetics of trait (what alleles?) Homozygous: two copies of the same allele –Homozygous dominant (BB) –Homozygous.
SNPs and complex traits: where is the hidden heritability?
Genomic Analysis: GWAS
upstream vs. ORF binding and gene expression?
Marker heritability Biases, confounding factors, current methods, and best practices Luke Evans, Matthew Keller.
PNPLA3 gene in liver diseases
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
Beyond GWAS Erik Fransen.
Exercise: Effect of the IL6R gene on IL-6R concentration
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Perspectives from Human Studies and Low Density Chip
Hugues Aschard, Bjarni J. Vilhjálmsson, Amit D. Joshi, Alkes L
Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association Test  Michael C. Wu, Seunggeun Lee, Tianxi Cai, Yun Li, Michael.
Carrier = an organism that has inherited a genetic trait or mutation, but displays no symptoms X-linked traits = traits that are passed on from parents.
Evaluating the Effects of Imputation on the Power, Coverage, and Cost Efficiency of Genome-wide SNP Platforms  Carl A. Anderson, Fredrik H. Pettersson,
Presentation transcript:

Analysis of imputed rare variants Andrew Morris Advanced Topics in GWAS Toronto, 30 May 2012

Introduction GWAS have been successful in detecting novel loci for complex traits: typically characterised by common variants of modest effect; together explain relatively little of the heritability. Low-frequency and rare variation may contribute to the “missing” genetic component of complex traits: IFIH1 and type 1 diabetes; MYH6 and sick sinus syndrome.

Rare variants and complex disease Rare variants are likely to have arisen from founder effects in the last few generations. Rare variants are expected to have larger effects on complex traits that common variants. Statistical methods focus on the accumulation of minor alleles at rare variants (mutational load) within the same functional unit.

GRANVIL Test of association of phenotype with proportion of rare variants at which individuals carry minor alleles. Model disease phenotype via regression on pi and any other covariates in GLM framework. 1 0 0 0 0 1 0 0 0 1 pi = 3/10 Reedik Magi http://www.well.ox.ac.uk/GRANVIL/

Assaying rare genetic variation Gold-standard approach to assaying rare genetic variation is through re-sequencing, which is expensive on the scale of the whole genome. GWAS genotyping arrays are inexpensive, but are not designed to capture rare genetic variation. Increasing availability of large-scale reference panels of whole-genome re-sequencing data: 1000 Genomes Project and the UK10K Project. Impute into GWAS scaffolds up to these reference panels to recover genotypes at rare variants at no additional cost, other than computing.

GRANVIL: imputed variants Test of association of phenotype with proportion of rare variants at which individuals carry minor alleles. Replace direct genotypes with posterior probability of heterozygous or rare homozygous call from imputation. Model disease phenotype via regression on pi and any other covariates in GLM framework. 0.9 0.1 0.2 0.1 0.1 0.8 0.1 0.1 0.1 0.6 pi = 3.0/10

Study question Can we make use of imputation into GWAS scaffolds up to re-sequencing reference panels to detect rare variant associations with complex traits? Simulation study performed to compare power to detect association using GRANVIL for four alternative strategies for assaying rare genetic variation.

Design and analysis strategies PHASED REFERENCE PANEL ANALYSIS COHORT

Strategy 1. Re-sequence analysis cohort PHASED REFERENCE PANEL ANALYSIS COHORT

Strategy 2. Genotype analysis cohort for variants in reference panel PHASED REFERENCE PANEL ANALYSIS COHORT Variants not present in reference panel will be missed

Strategy 3. Genotype analysis cohort with GWAS chip PHASED REFERENCE PANEL ANALYSIS COHORT

Strategy 4. Genotype analysis cohort with GWAS chip and impute variants on reference panel PHASED REFERENCE PANEL ANALYSIS COHORT Recovery of rare variants on reference panel by imputation

Simulation study Simulate 1050kb region of genome containing 50kb gene in phased reference panel (120, 500 or 4000 individuals) and analysis cohort (2000 individuals). Select causal variants within gene subject to maximum MAF and total MAF. Simulate quantitative trait for analysis cohort given causal variants and contribution to overall trait variance.

Simulation study Apply each strategy and test for association of rare variants (MAF<1% in analysis cohort) with quantitative trait using GRANVIL. Strategies 3 and 4: GWAS Illumina 660K chip. Strategy 4: Imputation performed using IMPUTEv2 allowing a “buffer” of 500kb, with low quality imputed variants (info score < 0.4) excluded from analysis. Assess power to detect association at a nominal 5% significance threshold.

Maximum MAF of causal variant: 1% Total MAF of causal variants: 5% Power at nominal 5% significance threshold, assuming 5% contribution to trait variance.

Maximum MAF of causal variant: 0.5% Total MAF of causal variants: 2% Power at nominal 5% significance threshold, assuming 5% contribution to trait variance.

Comments We can recover up to 80% of the power to detect rare variant associations attained through re-sequencing by imputation into GWAS data. Essential to include a “buffer” for imputation. As the MAF of causal variants decreases, larger reference panels offer greater power. Limiting assumptions of simulation study: No re-sequencing or phasing errors in the reference panel, and no miscalled or missing genotypes in the analysis cohort. Reference panel ascertained from same population as analysis cohort.

Application to WTCCC GWAS of seven complex human diseases from the UK (2000 cases each and 3000 shared controls from 1958 British Birth Cohort and National Blood Service): bipolar disease (BD), coronary artery disease (CAD), Crohn’s disease (CD), hypertension (HT), rheumatoid arthritis (RA), type 1 diabetes (T1D) and type 2 diabetes (T2D). Individuals genotyped using the Affymetrix GeneChip 500K Mapping Array Set.

Quality control Samples excluded on the basis of mismatch with external data, low call rate, outlying heterozygosity, duplication, relatedness, and non-European ancestry. SNPs excluded on the basis of: call rate <95% (<99% if MAF <5%); extreme deviation from HWE (exact p<5.7x10-7); MAF <1%. Cohort Samples passing QC Controls 2,938 BD 1,868 CAD 1,926 CD 1,748 HT 1,952 RA 1,860 T1D 1,963 T2D 1,924 A total of 16,179 samples and 391,060 high-quality autosomal SNPs carried forward for analysis

Fine-scale UK population structure Fine-scale population structure may have greater impact on rare variants than on common SNPs because of recent founder effects. Utilised EIGENSTRAT to construct principal components to represent axes of genetic variation across the UK: 27,770 high-quality LD pruned (r2<0.2) common autosomal SNPs (MAF>5%).

Fine-scale UK population structure

Imputation SNPs mapped to NCBI build 37 of human genome. Samples imputed up to 1000 Genomes Phase 1 cosmopolitan reference panel (June 2011 interim release). 8.23M imputed autosomal rare variants (MAF<1%) polymorphic in WTCCC. 5.38M (65.3%) were “well-imputed” (i.e. Info score > 0.4) and carried forward for analysis. Mean info score was 0.618, and 17.3% had info score > 0.8.

Rare variant analysis Test for association of each disease with accumulation of rare variants (MAF<1%) within genes using GRANVIL. Gene boundaries defined from UCSC human genome database (build 37). Analyses adjusted for three principal components to adjust for fine-scale UK population structure. Genome-wide significance threshold p<1.7x10-6: Bonferroni adjustment for 30,000 genes.

No evidence of residual population structure

Rare variant association with CAD Genome-wide significant evidence of association of CAD with rare variants in the gene PRDM10 (p=4.9x10-8). Gene contains 122 well imputed rare variants with mean MAF of 0.23%. Accumulations of minor alleles across these variants were associated with decreased risk of disease: odds ratio 0.828 (0.774-0.886) per minor allele.

Rare variant association with T1D Genome-wide significant evidence of association of T1D with rare variants in multiple genes from the MHC. Strongest signal of association observed for HLA-DRA (p=2.0x10-13). Gene contains 23 well imputed rare variants with mean MAF of 0.32%. Accumulations of minor alleles across these variants were associated with decreased risk of disease: odds ratio 0.556 (0.476-0.650) per minor allele.

T1D association across the MHC Ten genes achieve genome-wide significant evidence of rare variant association with T1D. HLA-DRA SLC44A4 HLA-DRB5 PBX2 TNXA PBMUCL2 EHMT2 AGPAT1 C6orf10 NCR3

T1D association across the MHC After additional adjustment for additive effect of lead GWAS common variant from the MHC (rs9268645). PBX2 HLA-DRA HLA-DRB5 SLC44A4 SKIVL2 HLA-DMA PBMUCL2 EHMT2 AGPAT1 TNXB

T1D association across the MHC

Comments GRANVIL assumes the same direction of effect on the trait of all rare variants within the functional unit. Methodology allowing for different directions of effect of rare variants are well established for re-sequencing data, and are being generalised to allow for imputation. The most powerful rare variant test will depend on the underlying genetic architecture of the trait.

Summary Simulations suggest that we can recover up to 80% of the power to detect rare variant associations attained through re-sequencing by imputation into GWAS data. Requires no additional cost, other than computation, which is not trivial! Imputation up to 1000 Genomes reference panel into GWAS data from WTCCC highlighted: novel association of rare genetic variation in PRDM10 with CAD; complex genetic architecture underlying T1D association across the MHC involving multiple genes.

Lab practical Use GRANVIL to test for association of T1D with imputed rare variants within genes across the MHC, using data from the WTCCC. Investigate the impact on results of: the MAF threshold for inclusion of rare variants in the analysis; filtering rare variants on the basis of annotation; gene boundary definition.