Pan Du, Simon Lin Robert H. Lurie Comprehensive Cancer Center 2/23/2019 Cross-site and Cross-platform Concordance of Microarray Analysis Improved by Variance Stabilization Pan Du, Simon Lin Robert H. Lurie Comprehensive Cancer Center
Why Variance Stabilization? How to Stabilize Variance? 2/23/2019 Outline Why Variance Stabilization? How to Stabilize Variance? Illumina Affymetrix Does it work? 2/23/2019
Introduction of Microarray Studies normal cancer A A Array x Array y Array x Array y Biomedical Applications Quality Control Studies (Johnson and Lin, Nature 411:885, 2001) 2/23/2019
Evaluation criterion of reproducibility: Concordance Lab A Lab B Gene list A Gene list B Anything in common? % in common number of genes selected ideal better worse 100 FDA-led Quality Control Study cross-time cross-site cross-platform (Tong et al., Nature Biotech 24:1132, 2006) 2/23/2019
General Microarray Analysis Procedure 2/23/2019 General Microarray Analysis Procedure Sample preparation Microarray experiment and data collection Background adjustment Transformation Normalization Gene identification (log2) 2/23/2019
Why Variance Stabilization? x-y plot mean-var plot Ideal raw x log2 (x) log2 (x+offset) 2/23/2019
Why do we care? A general assumption of statistical tests to microarray data: variance is independent of intensity Gene A: 7 (normal) → 8 (cancer) Gene B: 13 (normal) → 14 (cancer) 2/23/2019
Variance Stabilization: the model 2/23/2019 Variance Stabilization: the model A mathematical model of microarray hybridization (Rocke and Durbin, Bioinformatics 19:996, 2003) 2/23/2019
Variance Stabilization: deriving h(y) 2/23/2019 Variance Stabilization: deriving h(y) Asymptotic variance-stabilizing transformation can be achieved by (Tibshirani, JASA, 1988) 2/23/2019
VSN (Variance Stabilizing Normalization) 2/23/2019 Huber’s Solution (2002) VSN (Variance Stabilizing Normalization) Estimate the mean and variance from a set of arrays Assume most genes are not differentially expressed Technically challenging because the normalization between arrays has to be considered Practically challenging because usually we have only 2 ~ 6 arrays (Huber et al., Bioinformatics, 2002) 2/23/2019
Illumina BeadArray Technology 2/23/2019 Illumina BeadArray Technology Larger than 30 technique replicates are on each array. Beads are randomly assembled and held in these microwells Multiple arrays on the same slide Cost: < $200 2/23/2019
Variance Stabilizing Transformation (VST) 2/23/2019 Variance Stabilizing Transformation (VST) Fit the relations between mean and standard deviation Relations between log2 and VST (arcsinh) 2/23/2019 (Lin, Pan, Huber, and Warren, 2007)
Variance Stabilization of the Technical Replicates 2/23/2019 Variance Stabilization of the Technical Replicates 2/23/2019
Comparison of Log2, VSN and VST 2/23/2019
Barnes data: (Barnes, M., et al., 2005) 2/23/2019 Evaluation Data Sets Barnes data: (Barnes, M., et al., 2005) measured a dilution series (two replicates and six dilution ratios) of two human tissues: blood and placenta. MAQC-I: (Shippy, R., et al., 2006) Similar dilution series, conducted at more than one microarray facilities using both Illumina and Affymetrix platforms 2/23/2019
Experiment Design 2/23/2019
Cross-site concordance evaluation 2/23/2019 Cross-site concordance evaluation MAQC data VST improves the cross-site concordance 2/23/2019
Hypothesis: VST also works for Affymetrix arrays 2/23/2019 VST for Affymetrix Hypothesis: VST also works for Affymetrix arrays Treat each pixel as a technical replicate Model the mean and variance the same way 2/23/2019
Cross-site concordance for Affymetrix 2/23/2019 Cross-site concordance for Affymetrix 2/23/2019
Cross-platform: Affymetrix and Illumina 2/23/2019 Cross-platform: Affymetrix and Illumina Evaluation procedure Comparing sample C and D in the MAQC study The probe ids were first mapped to the Entrez IDs. Legend notation “Current”: RMA (affymetrix), Log2+Quantile (Illumina) “Improved”: VST+RMA (affymetrix); VST+Quantile 2/23/2019
Bioconductor lumi package 2/23/2019 Bioconductor lumi package The VST and related algorithms are included in the Bioconduction lumi package Bioconductor: http://www.bioconductor.org 2/23/2019
Robert H. Lurie Comprehensive Cancer Center, Northwestern University 2/23/2019 Acknowledgements Robert H. Lurie Comprehensive Cancer Center, Northwestern University Warren A. Kibbe and other members in the Bioinformatics group Denise Scholtens, Biostatistics European Bioinformatics Institute Wolfgang Huber The Walter and Eliza Hall Institute of Medical Research, Australia Gordon Smyth 2/23/2019