Presentation is loading. Please wait.

Presentation is loading. Please wait.

Why I chose: First reading results seemed counterintuitive Introduction full of references I didn’t know Useful? Or Gee Whizz so what?...Needed to read.

Similar presentations


Presentation on theme: "Why I chose: First reading results seemed counterintuitive Introduction full of references I didn’t know Useful? Or Gee Whizz so what?...Needed to read."— Presentation transcript:

1 Why I chose: First reading results seemed counterintuitive Introduction full of references I didn’t know Useful? Or Gee Whizz so what?...Needed to read in detail Seemed relevant to our MND study GWAS + imputation + sequencing Nicely laid our for journal club presentation

2 Localisation success rate = probability that the causal SNP is top ranked within an associated region Consider 2 SNPs One causal from sequencing or imputation – imperfect genotyping accuracy One tag from GWAS perfect genotyping accuracy MAF both SNPs = 0.12 Causal SNP OR =1.25 Selection at tag SNP based on p-value < 0.05 in 1000 cases & 1000 controls Correlation between actual and estimated genotype at the causal SNP Correlation between actual genotype at causal and genotyped SNPs Call rate at causal SNP Generates Fig 1-3 Association test statistic at causal or genotyped SNP depends on joint effects of selection based on p-value, tagging and genotyping accuracy

3 Figure 1. Tagging effect decreases localization success rates with or without the selection effect. A& B Tight linkage disequilibrium between SNPs can obscure the causal SNP C&D Selection at the tag SNP inflates the association evidence at the tag, increasing the probability that it outranks the causal SNP Localisation success rate = probability that the causal SNP is top ranked within an associated region Causal MAF 0.12 Correlation causal & non-causal seq SNP 0.9 OR=1.25 Perfect genotyping accuracy Tag MAF 0.12

4 Fig S8: Tagging effect decreases localization success rates with or without the selection effect, 3 SNPs:1 tag, 1 causal, 1 noncausal sequencing SNP. Fig S9: Tagging effect decreases localization success rates with or without the selection effect 5 SNPs: 1 tag, 1 causal, 3 non- causalsequencing SNPs. Causal MAF 0.12 Correlation causal & non-causal seq SNP 0.9 OR=1.25 Perfect genotyping accuracy Tag MAF 0.12 Causal MAF 0.02 Correlation causal & non-causal seq SNP 0.9 OR=1.5 Perfect genotyping accuracy Tag MAF 0.02 Causal MAF 0.02 Correlation causal & non-causal seq SNP 0.9 OR=1.5 Perfect genotyping accuracy Tag MAF 0.02

5 Figure 2. Low genotyping accuracy at causal SNP further reduces localization success rates with or without the selection effect. Sequencing or imputation error decreases the localization success rate, with or without tag selection Causal MAF 0.12 OR=1.25 Tag MAF 0.12 Perfect genotyping accuracy for tag SNP

6 S4. Low genotyping accuracy at causal SNP further reduces localization success rates with or without the selection effect RARE causal SNP Causal MAF 0.02 OR=1.5 Tag MAF 0.02 Perfect genotyping accuracy for tag SNP

7 S5. Low genotyping accuracy at causal SNP further reduces localization success rates with or without the selection effect common causal SNP Causal MAF 0.25 OR=1.25 Tag MAF 0.25 Perfect genotyping accuracy for tag SNP

8 Figure 3. Counter-intuitively, sample size can reduce localization success rate Well-tagged causal SNPs sequenced with low accuracy are unlikely to be correctly identified even as sample size increases. When the causal SNP is less accurately genotyped than one of its highly correlated proxies (i.e.  C <  G and r CG is large), the proxy SNP may capture the association better than the causal SNP. As a result, this proxy SNP will out-rank the causal SNP more than 50% of the time. Causal MAF 0.12 Correlation causal & non-causal seq SNP 0.9 OR=1.25 Perfect genotyping accuracy Tag MAF 0.12

9 MAF = 0.02 MAF = 0.12 MAF=0.25 Results so far demonstrate the need to correct for the joint effects of selection, tagging and genotyping accuracy on the localization success rate. How to correct?

10 Test statistic at sequenced SNP Call rates i.e missingness Joint vs individual G=tag S=seq Correlation between genotyped and sequenced in sample when no errors Estimate of selection bias of genetic effect at tag SNP – form of winner’s curse Correlation between true genotype and sequenced genotype in the sample Revised test statistic at sequenced SNP When low difference between test statistic and revised test statistic increases Missingness rate Is zero if independent samples are used for sequencing and identification of tag SNP

11 G= genotyped C=causal rCG = correlation between genotyped and causal SNPs Selection effect most pronounced when low power at the tag SNP

12 The higher the correlation between the genotyped and sequenced SNP, the higher the test statistic at the sequenced SNP and the lower its variance Unconditional expected association at the sequenced SNP Distortion due to the tag SNP selection propogated through correlation SNPs in high LD with the tag are more likely to be top-ranked = “tagging effect”

13 Boot strap resampling at the genome-wide level Incorporates information across the whole genome to account for effects of LD and rank on bias Counts of missingness Estimate from sample Mean posterior genotype eg MACH ratio of variance estimate or full genotype posterior probabilities eg BEAGLE r 2

14 Scenario 1: GWAS used for discovery, and sequencing/ imputation used for fine- mapping around GWAS ‘‘hits’’ using the same GWAS sample. GWAS-focused design based on the WTCCC Type 1 Diabetes A significant region is identified by a significant GWAS tag SNP (p < 5x10 -7 ) and followed by fine-mapping with post-GWAS data (sequenced or imputed SNPs) in the region surrounding the tag SNP. The SNP with the largest test statistic in the region is selected as the best candidate causal SNP. Scenario 2: All GWAS and sequenced/imputed SNPs used for discovery and fine-mapping in the same dataset. Scenario 3: Discovery and fine-mapping using different datasets. Scenario 4: Discovery and fine-mapping using different datasets + Multiple causal SNPs. Scenario 5 Discovery and fine-mapping using different datasets + missing data (imperfect call rate)

15 Table 2. Parameters and parameter values of the main simulation studies.

16 Table 3. Localization success rates for simulation Scenarios 1, 2, 3, 4. Scenario 1: GWAS used for discovery, and sequencing/ imputation used for fine- mapping around GWAS ‘‘hits’’ using the same GWAS sample. Across table Down table Adverse effect of tagging (down table) and genotyping accuracy (across table) are highest when causal SNP is well tagged (larger r) and less accurately sequenced (low rho) e.g. high density GWAS followed by low density sequencing Well-tagged causal SNPs suffer lower localisation success rates because perfectly genotyped tag captures the association better than the imperfectly sequenced/imputed causal SNP No good if tag is causal After re-rankig localisation success rate “similar” to when tag is not causal. “Minor tradeoff” as GWAS SNP unlikely to be causal

17 Table 3. Localization success rates for simulation Scenarios 1, 2, 3, 4. Scenario 2: All GWAS and sequenced/imputed SNPs used for discovery and fine-mapping in the same dataset. Scenario 2: All GWAS and sequenced/imputed SNPs used for discovery and fine-mapping in the same dataset ie significance is not required at the GWAS SNP. Impact of sample size, correlation between tag and causal SNP fixed Genotyping accuracy alone impacts Big impact of re-ranking when low seq cover and large sample size

18 Table 3. Localization success rates for simulation Scenarios 1, 2, 3, 4. Scenario 3: Discovery and fine-mapping using different datasets. Very simialar rates to scenario 2

19 Table 3. Localization success rates for simulation Scenarios 1, 2, 3, 4. Scenario 4: Discovery and fine-mapping using different datasets (as 3)+ Multiple causal SNPs Improves re- ranking for both causal SNPs

20 Table 4. Localization success rates for simulation Scenarios 5a. Scenario 5 Discovery and fine-mapping using different datasets + missing data (imperfect call rate) (across table changed) Missing data affect localisation success rates in a similar manner to imperfect genotyping accuracy

21 Summary from simulation GWAS-based region selection or moderate genotype error substantially reduces the probability of correctly identifying the causal SNP Proposed re-ranking can recover lost power increasing localisation success rates by 1.5 to 3 times When genotypig accuracy is high power lost due to tagging is small so re-ranking has no effect

22 Figure 4. Naïve test statistics and re-ranking statistics for regions surrounding rs in the 8q24.21 region for association with prostate cancer risk. Michaela et al Prostate cancer Consortium different genotyping platforms Imputed to 1000 Genomes Fixed-effect meta-analysis Cohorts excluded from assocation analysis if imputation r2 < 0.8 Report 5 statistically independent regions within 8q24.21 locus plus 11q13.3 and 17q24.3 Selected all SNPs in LD r2 > 0.2 with index SNP Didn’t exclude studies based on imputation r2 Only correct for imputation accuracy ie deltaG =0 New top SNPs for 8q24.21 and 17q24.3 8q24.21: 2 SNPs move from lower ransks to top 10%

23 Figure 5. Naïve test statistics and re-ranking statistics for regions surrounding rs in the 17q24.3 region for association with prostate cancer risk. 8 SNPs move from lower ranks to top 10% SNPs naively ranked in top 10% stay highly ranked When most SNPs are well genotyped re-ranking only makes subtle changes One poorly imputed SNP (yellow) moves form rank 245 to 16. Association driven by one study (rank 10), when removed SNP rank is 306 changing to 106

24 DISCUSSION Tagging and genotyping accuracy are non-trivial sources of bias that could obscure association evidence at the causal SNP Proposed re-ranking is simple to implement and can substantially increase the probability of identifying the causal SNP For low coverage sequencing we recommend the re-ranking method For imputation and high coverage sequencing we recommend that unfiltered SNPs in associated regions be used with the re-ranking method Large changes in rank should be carefully examined for heterogeneity between studies Re-ranking is most beneficial when genotyping accuracy is low High density genotyping followed by low density sequencing can generate misleading results- Don’t do it Imputation and sequencing software output accurate estimates of rho needed for the re-ranking

25 DISCUSSION Re-ranking important when study specific factors exacerbate GWAS-based selection and genotyping error High genetic diversity so sequence read are difficult to align Low LD among SNPs or lack of population-specific reference panel so poor imputation Low MAF SNPs tend to suffer from both low power and high genotyping error When genotyping accuracy is very poor, re-ranking may not be able to generate useful results- first consider accuracy thresholds recommended by genotype calling or imputation algorithm Re-ranking only improves localization success when applied to SNPs under the alternative, ie SNPs that re themselves causal or in LD with a causal SNP

26 Existing methods that incorporate genotyping uncertainty into tests for association do not completely recover lost power This paper considered frequentist and Bayesian methods of incorporating uncertainty We anticipate that re-ranking to correct for the adverse effects of selection, tagging and differential genotyping accuracy rates will continue to be important because cost- effective designs are for low-coverage large sample sizes


Download ppt "Why I chose: First reading results seemed counterintuitive Introduction full of references I didn’t know Useful? Or Gee Whizz so what?...Needed to read."

Similar presentations


Ads by Google