Presentation is loading. Please wait.

Presentation is loading. Please wait.

Normalization of Microarray Data - how to do it! Henrik Bengtsson Terry Speed

Similar presentations


Presentation on theme: "Normalization of Microarray Data - how to do it! Henrik Bengtsson Terry Speed"— Presentation transcript:

1 Normalization of Microarray Data - how to do it! Henrik Bengtsson (hb@maths.lth.se) Terry Speed (terry@stat.berkeley.edu)

2 Outline The X Data Set (R,G)  (M,A) Transformation Background correction or not? Within slide normalization Across slide normalization Identifying differentially expressed genes The X2 Data Set

3 The X Data Set All slides are replicates and contains 5184 spots/genes. Three identical RNA preparations were done; (a) was hybridized to slide 1-3, (b) to slide 4-6, and (c) to slide 7-9. All data is collected by GenePix TM Scanner and Software. The following analysis was done using [R] and the sma library by Terry Speed Group. SlideTitleName 1Mutant (a) vs. Reference (a)dUDG558 2Mutant (a) vs. Reference (a)dUDG409 3Mutant (a) vs. Reference (a)dUDG405 4Mutant (b) vs. Reference (b)dUDG411 5Mutant (b) vs. Reference (b)dUDG412 6Mutant (b) vs. Reference (b)dUDG414 7Mutant (c) vs. Reference (c)dUDG413 8Mutant (c) vs. Reference (c)dUDG415 9Mutant (c) vs. Reference (c)dUDG813

4 (R,G)  (M,A) Transformation “Observed” data {(R,G)} n=1..5184 : R = red channel signal G = green channel signal (background corrected or not) Transformed data {(M,A)} n=1..5184 : M = log 2 (R/G) (ratio), A = log 2 (R·G) 1/2 = 1/2·log 2 (R·G) (intensity)  R=(2 2A+M ) 1/2, G=(2 2A-M ) 1/2

5 Background correction or not? Decision 1: No background correction

6 Within Slide Normalization Question: What kind of normalization should be applied: 1.No normalization, or 2.Global (lowess) normalization, or 3.Print-tip normalization, or 4.Scaled print-tip normalization?

7 No Normalization Non-normalized data {(M,A)} n=1..5184 : M = log 2 (R/G)

8 Global (lowess) Normalization Global normalized data {(M,A)} n=1..5184 : M norm = M-c(A) where c(A) is an intensity dependent function.

9 Print-tip Normalization Print-tip normalized data {(M,A)} n=1..5184 : M p,norm = M p -c p (A);p=print tip (1-16) where c p (A) is an intensity dependent function for print tip p. 1234 5678 9101112 13141516 Print-tip layout

10 Scaled Print-tip Normalization Scaled print-tip normalized data {(M,A)} n=1..5184 : M p,norm = s p ·(M p -c p (A));p=print tip (1-16) where s p is a scale factor for print tip p (Median Absolute Deviation). After print-tip normalizationAfter scaled print-tip normalization

11 Spatial Effects No normalizationGlobal normalization Print-tip normalization Scaled Print-tip normalization

12 Another Quick Example Scaled print-tip normalization:

13 Within Slide Normalization Summary Question: What kind of normalization should be applied: 1.No normalization, or 2.Global (lowess) normalization, or 3.Print-tip normalization, or 4.Scaled print-tip normalization? Decision 2: Scaled print-tip normalization.

14 Across Slides Normalization Scaled print-tip normalization Median Absolute Deviation (MAD) Scaling Averaging

15 Average Over All Slides The “average” slide:

16 Cutoff by M values Top 5% of the absolute M values (|M| > 0.56):

17 Cutoff by T values Top 5% of the absolute T values (|T|>8.6) s.t. SE(M) > 0.03:

18 SE Cutoff Level In this data set, the number of genes found is insensitive to the SE cutoff level. About 1000 of the genes with smallest SE can be cutoff before it affects the final results.

19 103 Differentially Expressed Genes Top 5% of the absolute T values (|T|>8.6) s.t. SE(M) > 0.03, and top 5% of the absolute M values (|M|>0.56):

20 Location of Differentially Expressed Genes Location of the 4x4 grid sized microarray

21 25 Differentially Expressed Genes Top 2% of the absolute T values (|T|>11) s.t. SE(M) > 0.03 and top 2% of the absolute M values (|M|>0.9): Gene:M avg A avg TSE 1-2.26 9.9-18.00.125 2-1.9710.3-14.50.136 3-1.50 9.6-14.70.102 4-1.47 9.8-12.20.121 5 -1.40 9.3-11.90.118 6-1.30 9.9-14.40.090 7-1.29 9.7-14.60.088 8-1.2810.0-12.70.101 9-1.27 9.2-13.60.094 10-1.1910.7-13.70.087 11-1.18 9.8-11.40.103 12-1.17 9.9-20.70.057 13 1.1211.3 13.50.083 14-1.0711.4-13.30.080 15-1.05 9.6-12.80.081 16-1.02 9.9-12.00.085 17-1.01 9.3-11.80.086 18-0.9911.0-13.60.073 19-0.99 9.8-11.40.087 20-0.9710.5-13.80.070 21-0.96 9.6-12.50.077 22 0.9511.5 11.60.082 23-0.9410.3-25.00.038 24-0.93 9.8-13.50.068 25-0.9011.6-12.00.075

22 The X2 Data Set All slides are replicates and contains 5184 spots/genes. Three identical RNA preparations were done; (a) was hybridized to slide 1 & 2, (b) to slide 3 & 4, and (c) to slide 5 & 6. SlideTitleName 1Mutant (a) vs. Reference (a)dUDG816 2Mutant (a) vs. Reference (a)dUDG817 3Mutant (b) vs. Reference (b)dUDG818 4Mutant (b) vs. Reference (b)dUDG820 5Mutant (c) vs. Reference (c)dUDG821 6Mutant (c) vs. Reference (c)dUDG822

23 93 Differentially Expressed Genes Top 5% of the absolute T values (|T|>5.6) s.t. SE(M) > 0.03) and top 5% of the absolute M values (|M|>0.38):

24 Top 2% of the absolute T values (|T|>7.1) s.t. SE(M) > 0.03 and top 2% of the absolute M values (|M|>0.53): 25 Differentially Expressed Genes Gene:M avg A avg TSE 1 1.9712.58.30.237 21.279.718.20.070 31.23 13.27.50.164 41.12 12.319.20.058 5 0.93 14.27.70.122 60.86 13.710.20.085 7-0.86 12.5-8.10.106 8-0.8513.0-17.00.050 9-0.81 12.7-16.30.050 10-0.7511.1-8.60.088 11-0.72 11.4-11.40.063 12-0.71 13.9-15.60.045 13 0.6610.09.40.071 14 0.6610.89.20.072 15-0.6412.5-15.20.042 16 0.64 9.67.90.081 17-0.61 12.5-7.50.081 18-0.60 12.8-18.20.033 190.5911.48.30.071 20-0.59 13.7-8.30.071 21-0.5810.5-7.20.081 22-0.56 12.0-12.50.045 23 0.5511.7 9.10.061 24-0.5412.6-7.60.071 250.53 11.29.50.056

25 Acknowledgement Thanks to: Jean Yee Hwa Yang [R] Software (free): http://www.r-project.org/ The Statistical Microarray Analysis (sma) library (free): http://www.stat.berkeley.edu/users/terry/zarray/Software/smacode.html


Download ppt "Normalization of Microarray Data - how to do it! Henrik Bengtsson Terry Speed"

Similar presentations


Ads by Google