Download presentation
Presentation is loading. Please wait.
Published byAnabel Beasley Modified over 9 years ago
2
WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003
3
AIMS Discussion Information Introduction to the use of the webpage for automated normalization interface btw experimentalists and analysts feedback resource allocation
4
Acknowledgments some slides originally provided by: Terry Speed (Berkeley / WEHI) Sandrine Dudoit () Sandrine Dudoit (Berkeley) Yee Hwa Yang (Berkeley) Natalie Thorne (WEHI) Otto Hagenbuechle Eugenia Migliavacca Darlene Goldstein and others
5
RNA ISOLATION (AMPLIFICATION) AND LABELING WITH FLUORO- DYES Preparation
6
Hybridisation Binding labelled samples (targets) to complementary probes on a slide cover slip Hybridise for 5-12 hours Wash Mix
7
Scanning 1 2 Adjust scanner parameters; frequently can adapt: 1. excitation wave (laser) intensity 2. "gain" (amplification) of the photon detection system 1 2
8
Human 10K cDNA Array How to extract data ? How to recognize problems ?
9
Part of the image of one channel false-coloured on a white (v. high) red (high) through yellow and green (medium) to blue (low) and black scale. Scanner's Spots
10
RNA preparation and Labeling Data for further analysis Slide scanning Hybridisation Image analysis Normalization Steps of a Microarray Experiment Why perform an experiment ? What is the aim ? Which conclusions do you want to reach ? first: DESIGN !
11
mRNA abundance rRNA 80% tRNA mRNA 1% 1-50 50-500 500+ approx. 300'000 mRNA Molecules/cell approx. 10-20'000 different genes What do you want to measure ? RNA mass different in different cells
12
Relative vs Absolute changes 200'000 mRNA Molecules/cell 200 for gene X (0.1%) 400'000 mRNA Molecules/cell 400 for gene X (0.1%) Is gene X differentially expressed ?
13
RNA preparation and Labeling Data for further analysis Slide scanning Hybridisation Image analysis Normalization R, G, M, A, etc 16-bit TIFF files (Rfg, Rbg), (Gfg, Gbg), etc What is needed for high quality data ? Which are the critical steps ? Steps of a Microarray Experiment
14
RNA preparation and Labeling Data for further analysis Slide scanning Hybridisation Image analysis Normalization Adjust / Balance channels approx.; avoid saturation check normalized and unnormalized data of exp RNA and of spiked RNA Spike-in RNA in known conc. and ratios Steps of a Microarray Experiment
15
Why avoid saturation ? Why balance channels ? Why perform "normalization" ? What to check before and after normalization ? Why calculate ratios ? Why calculate log ratios ?
16
Aim: Gene Expression Data Gene expression data on p genes for n samples Genes Slides Gene expression level of gene 5 in slide 4 j M = Log 2 ( Red intensity / Green intensity) slide 1slide 2slide 3slide 4slide 5 … 1 0.46 0.30 0.80 1.51 0.90... 2-0.10 0.49 0.24 0.06 0.46... 3 0.15 0.74 0.04 0.10 0.20... 4-0.45-1.03-0.79-0.56-0.32... 5-0.06 1.06 1.35 1.09-1.09... These values are conventionally displayed on a red (>0) yellow (0) green (<0) scale.
17
Objectives for high quality Important aspects include: Tentatively separating systematic sources of variation ("artefacts"), that bias the results, from random sources of variation ("noise"), that hide the truth. Removing the former as well as possible and quantifying the latter Only if this is done can we hope to reach good quality and make valid statements about the confidence in the results
18
Typical Statistical Approach Measured value = real value + systematic errors + noise Corrected value = real value + noise Analysis of Corrected value => (unbiased) CONCLUSIONS Estimation of Noise => quality of CONCLUSIONS, statistical significance (level of confidence) of the conclusions
19
Image Analysis => Rfg ; Rbg ; Gfg ; Gbg (fg = foreground, bg = background.) For each spot on the slide calculate: Red intensity = R = Rfg - Rbg Green intensity = G = Gfg - Gbg M = Log 2 ( Red intensity / Green intensity) Subtraction of background values (additive background model assuming to be locally constant …) Sources of background: probe unspecifically sticking on slide, irregular / dirty slide surface, dust, and noise / errors) in the scanner measurement Not included: real cross-hybridisation and unspecific hybridisation to the probe Step 1: a) Background Correction b) Calculation of (log) ratios
20
Subtraction of background has shown frequently not to improve the performance: while making the average of many measurements closer to the true values (reduced bias or systematic error) it causes higher variability (lower reproducibility) Comment to Background Correction A. High variance - Unbiased Estimator B. Low variance - Biased Estimator average single meas.
21
A.High variance - Unbiased Estimator when you take many measurements: the average will be closer to the true value more frequently B. Low variance - slightly biased Estimator when you take one or a few measurements: the average will be closer to the true value more frequently DAF Microarrays 2002: we preferred no subtraction, should be re-evaluated with Agilent scanner (and GenePix IAS) Which is better ?
22
A reminder on logarithms
23
A numerical example
24
M = log R/G = logR - logG A = ( logR + logG ) /2 Positive controls (spotted in varying concentrations) Negative controls blanks Lowess curve Step 2: An M vs A (MVA) Plot
25
Why use an M vs A plot ? 1.Logs stretch out region we are most interested in. 2.Can more clearly see features of the data such as intensity dependent variation, and dye-bias. 3.Differentially expressed genes more easily identified. 4.Intuitive interpretation
26
S1.n. Control Slide: Dye Effect, Spread. MVA plot: looking at data Lowess curve Spot identifier
27
Normalisation - Median Assumption: Changes roughly symmetric First panel: smooth density of log 2 G and log 2 R. Second panel: M vs A plot with median put to zero Step 3: Normalisation - global median centering common median
28
Assumption: changes roughly symmetric at all intensities. Step 4: Normalisation - lowess- local median centering
29
What is this normalization doing?
30
Local regression Classical (global) regression: draws a single line to the entire set of points Local regression: draws a curve through noisy data by smoothing Lowess (LOcally WEighted Scatterplot Smoothing) is a type of local regression Can correct for both print-tip and intensity-dependent bias with lowess fits to the data within print-tip groups
31
Local regression illustrated
32
Lowess line
33
After within slide global lowess normalization. Likely to be a spatial effect. Print-tip groups Log-ratios Step 5: Normalisation - spatial corrections
34
Normalization between groups (ctd) After print-tip location- and scale- normalization. Log-ratios Print-tip groups normalized values look nice, but.....
35
Effects of Location Normalisati on (example) Before After
36
Boxplots of log ratios by pin group Lowess lines through points from each pin group Identifying sub-array effects
37
Assumption: All (print-tip-)groups should have the same spread in M True ratio is ij where i represents different (print-tip)-groups and j represents different spots. Observed is M ij, where M ij = a i * log( ij ) Robust estimate of a i is Corrected values are calculated as: Taking varying scale into account Step 6: Rescaling (Spread- Normalisation)
38
Illustration: print-tip-group - Normalisation Assumption: For every print group: changes roughly symmetric at all intensities. Glass Slide Array of bound cDNA probes 4x4 blocks = 16 pin groups
39
Which normalization to use? Case 1: A few genes that are likely to change and / or a random large collection of genes (expect as many up as down): Each slide per se: –Location: print-tip-group lowess normalization. –Scale: for all print-tip-groups, adjust MAD to equal the geometric mean for MAD for all print-tip-groups. Case 2: Non-random gene collection and / or many genes do change appreciably: –USE DYE-SWAP APPROACH –Self-normalization: take the difference of the two log-ratios. –Check using controls or known information.
40
MVA plots: what to look at ? How to use the spikes ? Points: signal intensity background saturation homogeneity, normalizability problem diagnosis
41
Webpage How to use the plots ? Use of the different options
42
Quality control before normalization (?) Choice of normalization
54
END questions
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.