# \department of mathematics and computer science Supervised microarray data analysis Mark van de Wiel.

## Presentation on theme: "\department of mathematics and computer science Supervised microarray data analysis Mark van de Wiel."— Presentation transcript:

\department of mathematics and computer science Supervised microarray data analysis Mark van de Wiel

\department of mathematics and computer science Quality control Protocols Perform a small scale, well-controlled experiment to assess influence of experimental factors (Microarrays from different batches, printing tips, dyes, linearity of the scanner, etc.) Continuous factors (temperature, humidity, spotsize over time, intensity of control spot over time) can be monitored with standard control chart techniques.

\department of mathematics and computer science Design of the experiment Think very, very well what the biological goals are. What software do you have at your disposal to analyse the data? Do we need reference or not? ‘Biological design’: what tissues to combine on an array (cDNA)? More than one biological factor: factorial design Dye-bias: dye-swap. Design on the array (negative/positive controls, repeats?, how many genes? Pilot study first, distributing the repeats over experimental factors (spatial, printing tips, etc.)) Save some space on the (cDNA) microarray for assessing variability due to experimental factors (e.g. print same control gene with several printing tips)

\department of mathematics and computer science Analysis: Multiple testing (after normalization) Objective: control the number of falsely selected genes FWE: Family wise error rate Weak FWE control: P(falsely select gene i, i=1,..., 20.000 | no gene truly expressed)   Strong FWE control: P(falsely select gene i, i=1,..., 20.000 | some genes expressed, some genes not expressed)   FDR: False Discovery Rate F: Expected number of false rejections when no genes are expressed, T: Total number of rejections FDR control: F/T  

\department of mathematics and computer science Multiple testing: FWE vs FDR Control of FDR implies weak control of FWE Advantage strong control of the FWE: significance level  under all situations controlled Disadvantage: less power than FDR control FWE based procedures tend to select less genes than FDR based procedure Software: Bioconductor: Step-down Westfall-Young (Dudoit et al.), control FDR and FWE. SAM (permutation based ‘control’ of FDR)

\department of mathematics and computer science SAM Developed at Stanford, Tibshirani et al. (Paper: Tusher et al, PNAS 98, 5116-5121) Claim is FDR-control Plus: 1.Ease of use, add-in to Excel 2.Allows asymmetric cut-offs Minus: 1.Distribution under the null-hypotheses (‘no expression’) needs to be the same for all genes to guarantee FDR control 2.Combination with k-fold rule: no control of FDR anymore Solutions: Use (normal) rank scores and a simple rank statistic Explicitly test on k-fold expression; combine with FDR criterion

\department of mathematics and computer science Modelling vs Normalisation + Testing Modelling forces you to state what the assumptions are (linearity, normality, independence, etc.) Normalisation steps may not be commutative Non-linearities can be dealt with by normalisation methods Advanced modelling requires help of statistician/bio-informatician Standard approach to modelling: ANOVA. Model has two levels: 1.Normalisation level which includes linear corrections for dye and microarray effects 2.Gene expression level which includes effects on gene level, including interactions (interaction of interest is usually gene*variety)

\department of mathematics and computer science Software Freeware: SAM, Bioconductor Specialized commercial software: Spotfire, Genespring, Genesight, Rosetta Most contain: normalisation, variance stabilizing transformations, ANOVA, testing (most do not yet include the advanced multiple testing criteria) Statistical software: SAS, S-Plus, SPSS Much more debugged, long history, better documentation (Often very unclear what the specialized packages really do.) Advantages specialized software: user-friendly, visualisation (nice pictures), link with data bases, annotation Try several!!!

\department of mathematics and computer science Bayesian models +Natural translation to networks (pathways) +Complex models (linearity is not necessary, interactions) +Prior biological knowledge can be included +Nesting of the models (image analysis + normalisation + gene expression) +Inference for complex functions of gene expression data is relatively easy -No ‘easy’ software -Computational methods may take time to find reliable estimates Example Network

\department of mathematics and computer science Validation Cross-validation: leave some data out and see how well the data values are predicted by the model (Note that for normalisation procedures it may be harder to predict the data from the normalized data) Biological validation (spikes: known concentrations) Very useful for validating the normalisation procedure or the model: 1.Pretend that spikes with equal concentrations that are used under different conditions (different dyes, microarray batch)are different quantities. 2.Estimate ratio of two estimates after normalisation or modelling 3.Ratio should approximately be equal to 1.

\department of mathematics and computer science Comparison and meta analysis Objective comparisons between methods very much needed! Simulations may help (because we know the truth then). Setting up realistic simulations may be hard! Competition between several methods (CAMDA ’03: Lung cancer) Future goals: Methods that allow for combining data from several experiments. From relative quantities to absolute quantities. Absolute quantities allow for direct comparison between labs. (otherwise, only if labs have used same reference material etc.)

\department of mathematics and computer science Useful overview papers, books Design: Churchill, G.A. (2002) Fundamental of experimental design for cDNA microarrays. Nature Genet.32 (490-495) Analysis: Slonim, D.K. (2002) From patterns to pathways: gene expression data analysis comes of age Nature Genet.32 (502-508) Normalisation: Quackenbush, J. (2002) Microarray normalisation and transformation Nature Genet.32 (496-501) Pitfalls: Richard Simon et al. (2003) Pitfalls in the Use of DNA Microarray Data for Diagnostic and Prognostic Classification J Natl Cancer Inst; 95: 14-18. Books: Baldi & Hatfield (2002), DNA Microarrays and Gene expression, Cambridge University Press Speed, T. (2003) Statistical Analysis of Gene Expression Microarray Data Chapman & Hall Acknowledgement: Nicola Armstrong (EURANDOM)

Download ppt "\department of mathematics and computer science Supervised microarray data analysis Mark van de Wiel."

Similar presentations