C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control design Matt McQueen, Jessica.

C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control design Matt McQueen, Jessica Su, Nan Laird and Christoph Lange Harvard School of Public Health

Genome-wide association studies Limitation of linkage analysis and the potential of association analysis => genome-wide association studies (Risch & Merikangas 1997) 100,000 > SNPs and phenotypes are tested for association. Statistical road block: Severe multiple testing problem!!!

“ Using the same data set for screening and testing” Testing strategy: –Assess evidence for association for all SNPs based on S (Screening Step) –Select a small subset of N markers (10-200) –Compute the association test conditional upon S and adjust N comparisons (Testing Step) –If the screening step and the testing step are statistically independent, we can look at the data in the screening step without paying a “statistical price” for it. Screening technique S Testing statistic T

“ Using the same data set for screening and testing” General concept proposed by Laird and Lange (2006, Nat Rev Genet) Decomposition of joint-likelihood: P( {phenotype, genotype} ) = P( {phenotype, genotype} | S({phenotype, genotype}) ) * P(S{phenotype, genotype}) S = “Summary test statistic to assess evidence for association” Requirements for S: –The association test has to condition on S –S has to contain information about the potential association as well = Screening Step= Testing Step Testing strategy: –Assess evidence for association for all SNPs based on S (Screening Step) –Select a small subset of N markers (10-200) –Compute the association test conditional upon S and adjust N comparisons (Testing Step) –The screening step and the testing step are statistically independent !!!

“ Using the same data set for screening and testing” Application to family-based association tests (VanSteen et al (2005)) Decomposition of joint-likelihood: P( {phenotype, genotype, parent genotype} ) = P( {phenotype, genotype} | {phenotype, par. genotype} ) * P( {phenotype, par genotype}) S = “phenotype and parental genotype/sufficient statistic” = Screening Step based on conditional mean model Lange et al (2003) = Testing Step based FBAT Laird et al (2000) Properties of the testing strategy: –Outperforms standard adjustments for multiple comparions by factors up to 40 –Additional power boost by the use of complex phenotypes such as longitudinal data: Discovery of INSIG2 in a 100K-scan in the Framingham Heart Study First replicable association for BMI / obesity (Herbert et al (2006, Science)) Alternative approach: –Instead of using the between-component (Screening step) and the within-component (Testing Step) in 2 stage testing strategy one could include both components in the test statistics, e.g. QTDT (Abecasis et al (2000)) –Disadvantages: –Only marginal power gains (5%) over the FBAT-statistic when a single SNP is tested (Abecasis et al (2001)) –Lack of robustness against population admixture (Yu et al (2006)) = Within-family component (Fulker et al (1999)) = Between-family component Fulker et al (1999)

“ Using the same data set for screening and testing” Can we translate this concept to association studies in unrelated cases and controls?  2  Tests and Amitrage-trend tests are conditional tests that condition upon the margins => The data-partitioning statistic S are margins of the table COMPLETE SET Number of Alleles 012 Cases125265110500 Controls17324186500 2985061961000

COMPLETE SET Number of Alleles 012 Cases125265110500 Controls17324186500 2985061961000 ESTIMATION SET Number of Alleles 012 Cases6313355250 Controls8512043250 14825398 500. TESTING SET Number of Alleles 012 Cases6313355250 Controls8512043250 14825398 500. ESTIMATION SET Number of Alleles 012 Cases9413328255 Controls13012021271 224 75% 253 50% 49 25% 526. TESTING SET Number of Alleles 012 Cases3113282245 Controls4312165229 74 25% 253 50% 147 75% 474. = Screening Step= Testing Step Testing strategy: 1.) Divide table into a “screening table” and a “testing table“ 2.) For each SNP, use the “screening table” and the margins of the “testing table” to assess evidence for association in the screening step 3.) Select the most promising N SNPs and test them for association based on the data of the testing table. How can we obtain information about an association from the margins?

COMPLETE SET Number of Alleles 012 Cases125265110500 Control s 17324186500 298506196 100 0 NON-INFORMATIVE SET Number of Alleles 012 Cases9413328255 Controls13012021271 22425349526 TESTING SET Number of Alleles 012 Cases3113282245 Controls4312165229 74253147474 + IMPUTED SET Number of Alleles 012 Cases3113183245 Controls4312264229 74253147N (T) MARGINAL SET Number of Alleles 012 Cases...245 Controls...229 74253147474 SCREENING SET Number of Alleles 012 Cases125264111500 Control s 17324285500 2985061961000 Results will depend on the actual random split-up of the tables! Solution: 1.) Re-sampling of the tables 2.) p-value for testing set based on p(data)=p(data|S(data))*p(S(data)) and Monte-Carlo simulations

Simulation Study Cases/ControlsORSNPsMethodAllele Frequencies 0.100.200.300.40 5001.50100,000C 2 BAT0.060.180.250.29 Standard0.010.090.140.15 5001.60100,000C 2 BAT0.140.360.490.46 Standard0.020.190.340.30 7001.50100,000C 2 BAT0.130.360.560.53 Standard0.030.180.420.31 7001.60100,000C 2 BAT0.270.570.840.85 Standard0.090.350.640.68

Can C2BAT find INSIG2 in the 100K- scan in Framingham Heart Study again ? 1400 probands in about 300 families:  Randomly select 150 unrelated cases/controls (BMI>28 = “affected”) =>Apply standard analysis (p-value adjusted by Bonferroni correction) and C2BAT to see whether INSIG2 reaches genome-wide significance For 1000, replicates: Power of standard analysis to detect INSIG2: 5% Power of C2BAT to detect INSIG2:17%

Future work: 1.) Extension to quantitative traits =>Expression analysis 2.) Gene-gene interactions Software: www.c2bat.com

C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control design Matt McQueen, Jessica.

Similar presentations

Presentation on theme: "C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control design Matt McQueen, Jessica."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control design Matt McQueen, Jessica.

Similar presentations

Presentation on theme: "C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control design Matt McQueen, Jessica."— Presentation transcript:

Similar presentations

About project

Feedback