Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.

Similar presentations


Presentation on theme: "Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei."— Presentation transcript:

1 Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei Tumori, Milano NETTAB 2003 Bologna, 28th November 2003

2 Outline Biostatistics and microarrays Study objectives Design of microarray experiments A case study: a designed experiment

3 Biostatistics and microarrays Microarray research: unique challenge for interdisciplinary collaboration Can biostatisticians be useful in microarray research? Are available software tools a valid substitution for collaboration with biostatisticians?

4 What can biostatisticians do? Active collaboration with researchers from biomedical and bioinformatics fields –to develop and critically evaluate methods for design of microarray experiments analysis of data –to perform data-analysis –to develop software tools and train biomedical researchers to use them

5 Italian inter-university research group Statistical issues in design and analysis of microarray data MIUR grant 2003-2005 –Firenze (Annibale Biggeri) –Milano (Giuseppe Gallus) –Padova (Monica Chiogna) –Torino (Mauro Gasparini) –Udine(Corrado Lagazio)

6 Collaborations Milano Istituto Nazionale per lo Studio e la Cura dei Tumori, Milano (statisticians, biologists, molecular oncologists) IFOM, Milano (biologists, bioinformatics) Biometric Research Branch, NCI, Bethesda (statisticians) Edo Tempia Foundation, Biella Bioconductor poject (software development)

7 Study objectives Class comparison (supervised) –establish differences in gene expression between predetermined classes Class prediction (supervised) –prediction of phenotype using gene expression data Class discovery (unsupervised) –discover groups of samples or genes with similar expression

8 Design of microarray experiments Design of arrays Allocation of samples –Replication –Labeling of samples (cDNA) reference design balanced block design loop design

9 Levels of replication Biological replicates –multiple samples from different populations Technical replicates –multiple samples from the same subject –multiple samples from the same mRNA –multiple clones or probes of the same gene on the array

10 How many replicates? Biological replicates essential to make inference about population Technical replicates useful for quality control and for increasing precision How to determine sample size? –Problem-dependent –simple methods available for class comparison problems –not yet clear what to use for class discovery

11 Common pitfalls in microarray experiments Too little or no replication Use of replication at the wrong level Experiments with cell lines: assuming no variability among cell lines of the same type Inappropriate use of pooling –ok: use of multiple independent pools –but: is it useful? –individual information lost

12 Case study: a designed experiment Biological aim: assess the effect of Toremifen on MCF-7 breast cancer cell line, in terms of gene expression

13 BATCH A1A2A3 Control B1B2B3 Control Treatment CDNA Affymetrix CDNA Affymetrix CDNA Affymetrix CDNA Affymetrix CDNA Affymetrix CDNA Affymetrix POOL CDNA AffymetrixCDNAAffymetrix Week 1

14 BATCH A1A2A3 Control B1B2B3 Control Treatment POOL CDNA AffymetrixCDNAAffymetrix Week 2 and 3

15 Statistical aims comparison of microarray platforms (cDNA vs Affymetrix) hybridization of individual samples vs pools variability of cell lines robustness of commonly used statistical methods

16 Data available (so far) Hybridizations from Affymetrix HGU133 Chips Summary measure of intensities: MAS5 (Affymetrix, 2002) most commonly used, but other possibilities available –Robust Multichip Analysis (Irizarry et al., 2002) –Model-Based Expression Index (Li and Wong, 2001) (at least 16 chips!)

17 Brief summary of data HGU133A: –chipA : 22.283 probe sets –chipB : 22.645 probe sets Present –chipA: 48.5% –chipB: 38.2% pm { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/2/710191/slides/slide_17.jpg", "name": "Brief summary of data HGU133A: –chipA : 22.283 probe sets –chipB : 22.645 probe sets Present –chipA: 48.5% –chipB: 38.2% pm

18 Methods for exploring reproducibility among arrays Pearsons coefficient of correlation (common, but wrong!) Coefficient of variation Distribution of differences of intensities Altman and Blands plot (MA plot)

19

20 Class comparison Identification of differentially expressed genes between treated and not treated cell lines t-tests (adjusting for multiple comparisons) –all arrays –only pooled arrays –only individual arrays ANOVA (linear) model –estimation of treatment effect, adjusting for pool effect and week effect

21 Some results... Pooled variance t-test on whole data –treated versus controls: chipA –1948 p 2) –240 with Bonferroni correction chipB –743 p 2) –76 with Bonferroni correction

22 Some results... Pooled variance t-test on pooled data –treated versus controls: chipA –204 p<0.001 »189/(204, 1948) common to overall analysis –82 p 2 »82/(82, 356) common to overall analysis chipB –80 p<0.001 »69/(80, 743) common to overall analysis –37 p 2 »37/(37, 143) common to overall analysis

23 Some results... Pooled variance t-test on individual data –treated versus controls: chipA –669 p<0.001 »594/(669, 1948) common to overall analysis –226 p 2 »221/(226, 356) common to overall analysis chipB –245 p<0.001 »196(245, 743) common to overall analysis –80 p 2 »77/(80, 143) common to overall analysis

24 Some results... Pooled variance ANOVA results –treated versus controls: chipA –1913 p<0.001 »1624/(1913, 1948) common to overall analysis –343 p 2 »339/(343, 356) common to overall analysis chipB –245 p<0.001 »196(245, 743) common to overall analysis –80 p 2 »77/(80, 143) common to overall analysis

25

26 Conclusions ????????? So far no evidence for the usefulness of pooling data from cell lines… no evidence of decreased variability … but need to further investigate the differences in the individual versus pooled results Need for a plan of biological (quantitative) validations of expression measures


Download ppt "Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei."

Similar presentations


Ads by Google