Presentation on theme: "Analysis of imputed rare variants"— Presentation transcript:
1 Analysis of imputed rare variants Andrew MorrisAdvanced Topics in GWASToronto, 30 May 2012
2 IntroductionGWAS have been successful in detecting novel loci for complex traits:typically characterised by common variants of modest effect;together explain relatively little of the heritability.Low-frequency and rare variation may contribute to the “missing” genetic component of complex traits:IFIH1 and type 1 diabetes;MYH6 and sick sinus syndrome.
3 Rare variants and complex disease Rare variants are likely to have arisen from founder effects in the last few generations.Rare variants are expected to have larger effects on complex traits that common variants.Statistical methods focus on the accumulation of minor alleles at rare variants (mutational load) within the same functional unit.
4 GRANVILTest of association of phenotype with proportion of rare variants at which individuals carry minor alleles.Model disease phenotype via regression on pi and any other covariates in GLM framework.pi = 3/10Reedik Magi
5 Assaying rare genetic variation Gold-standard approach to assaying rare genetic variation is through re-sequencing, which is expensive on the scale of the whole genome.GWAS genotyping arrays are inexpensive, but are not designed to capture rare genetic variation.Increasing availability of large-scale reference panels of whole-genome re-sequencing data: 1000 Genomes Project and the UK10K Project.Impute into GWAS scaffolds up to these reference panels to recover genotypes at rare variants at no additional cost, other than computing.
6 GRANVIL: imputed variants Test of association of phenotype with proportion of rare variants at which individuals carry minor alleles.Replace direct genotypes with posterior probability of heterozygous or rare homozygous call from imputation.Model disease phenotype via regression on pi and any other covariates in GLM framework.pi = 3.0/10
7 Study questionCan we make use of imputation into GWAS scaffolds up to re-sequencing reference panels to detect rare variant associations with complex traits?Simulation study performed to compare power to detect association using GRANVIL for four alternative strategies for assaying rare genetic variation.
8 Design and analysis strategies PHASED REFERENCE PANELANALYSIS COHORT
12 Strategy 4. Genotype analysis cohort with GWAS chip and impute variants on reference panel PHASED REFERENCE PANELANALYSIS COHORTRecovery of rare variants on reference panel by imputation
13 Simulation studySimulate 1050kb region of genome containing 50kb gene in phased reference panel (120, 500 or 4000 individuals) and analysis cohort (2000 individuals).Select causal variants within gene subject to maximum MAF and total MAF.Simulate quantitative trait for analysis cohort given causal variants and contribution to overall trait variance.
14 Simulation studyApply each strategy and test for association of rare variants (MAF<1% in analysis cohort) with quantitative trait using GRANVIL.Strategies 3 and 4: GWAS Illumina 660K chip.Strategy 4: Imputation performed using IMPUTEv2 allowing a “buffer” of 500kb, with low quality imputed variants (info score < 0.4) excluded from analysis.Assess power to detect association at a nominal 5% significance threshold.
15 Maximum MAF of causal variant: 1% Total MAF of causal variants: 5% Power at nominal 5% significance threshold, assuming 5% contribution to trait variance.
16 Maximum MAF of causal variant: 0.5% Total MAF of causal variants: 2% Power at nominal 5% significance threshold, assuming 5% contribution to trait variance.
17 CommentsWe can recover up to 80% of the power to detect rare variant associations attained through re-sequencing by imputation into GWAS data.Essential to include a “buffer” for imputation.As the MAF of causal variants decreases, larger reference panels offer greater power.Limiting assumptions of simulation study:No re-sequencing or phasing errors in the reference panel, and no miscalled or missing genotypes in the analysis cohort.Reference panel ascertained from same population as analysis cohort.
18 Application to WTCCCGWAS of seven complex human diseases from the UK (2000 cases each and 3000 shared controls from 1958 British Birth Cohort and National Blood Service):bipolar disease (BD), coronary artery disease (CAD), Crohn’s disease (CD), hypertension (HT), rheumatoid arthritis (RA), type 1 diabetes (T1D) and type 2 diabetes (T2D).Individuals genotyped using the Affymetrix GeneChip 500K Mapping Array Set.
19 Quality controlSamples excluded on the basis of mismatch with external data, low call rate, outlying heterozygosity, duplication, relatedness, and non-European ancestry.SNPs excluded on the basis of:call rate <95% (<99% if MAF <5%);extreme deviation from HWE (exact p<5.7x10-7);MAF <1%.CohortSamples passing QCControls2,938BD1,868CAD1,926CD1,748HT1,952RA1,860T1D1,963T2D1,924A total of 16,179 samples and 391,060 high-quality autosomal SNPs carried forward for analysis
20 Fine-scale UK population structure Fine-scale population structure may have greater impact on rare variants than on common SNPs because of recent founder effects.Utilised EIGENSTRAT to construct principal components to represent axes of genetic variation across the UK: 27,770 high-quality LD pruned (r2<0.2) common autosomal SNPs (MAF>5%).
22 Imputation SNPs mapped to NCBI build 37 of human genome. Samples imputed up to 1000 Genomes Phase 1 cosmopolitan reference panel (June 2011 interim release).8.23M imputed autosomal rare variants (MAF<1%) polymorphic in WTCCC.5.38M (65.3%) were “well-imputed” (i.e. Info score > 0.4) and carried forward for analysis.Mean info score was 0.618, and 17.3% had info score > 0.8.
23 Rare variant analysisTest for association of each disease with accumulation of rare variants (MAF<1%) within genes using GRANVIL.Gene boundaries defined from UCSC human genome database (build 37).Analyses adjusted for three principal components to adjust for fine-scale UK population structure.Genome-wide significance threshold p<1.7x10-6: Bonferroni adjustment for 30,000 genes.
25 Rare variant association with CAD Genome-wide significant evidence of association of CAD with rare variants in the gene PRDM10 (p=4.9x10-8).Gene contains 122 well imputed rare variants with mean MAF of 0.23%.Accumulations of minor alleles across these variants were associated with decreased risk of disease: odds ratio ( ) per minor allele.
26 Rare variant association with T1D Genome-wide significant evidence of association of T1D with rare variants in multiple genes from the MHC.Strongest signal of association observed for HLA-DRA (p=2.0x10-13).Gene contains 23 well imputed rare variants with mean MAF of 0.32%.Accumulations of minor alleles across these variants were associated with decreased risk of disease: odds ratio ( ) per minor allele.
27 T1D association across the MHC Ten genes achieve genome-wide significant evidence of rare variant association with T1D.HLA-DRASLC44A4HLA-DRB5PBX2TNXAPBMUCL2EHMT2AGPAT1C6orf10NCR3
28 T1D association across the MHC After additional adjustment for additive effect of lead GWAS common variant from the MHC (rs ).PBX2HLA-DRAHLA-DRB5SLC44A4SKIVL2HLA-DMAPBMUCL2EHMT2AGPAT1TNXB
30 CommentsGRANVIL assumes the same direction of effect on the trait of all rare variants within the functional unit.Methodology allowing for different directions of effect of rare variants are well established for re-sequencing data, and are being generalised to allow for imputation.The most powerful rare variant test will depend on the underlying genetic architecture of the trait.
31 SummarySimulations suggest that we can recover up to 80% of the power to detect rare variant associations attained through re-sequencing by imputation into GWAS data.Requires no additional cost, other than computation, which is not trivial!Imputation up to 1000 Genomes reference panel into GWAS data from WTCCC highlighted:novel association of rare genetic variation in PRDM10 with CAD;complex genetic architecture underlying T1D association across the MHC involving multiple genes.
32 Lab practicalUse GRANVIL to test for association of T1D with imputed rare variants within genes across the MHC, using data from the WTCCC.Investigate the impact on results of:the MAF threshold for inclusion of rare variants in the analysis;filtering rare variants on the basis of annotation;gene boundary definition.