EPP 245 Statistical Analysis of Laboratory Data

EPP 245 Statistical Analysis of Laboratory Data
12/19/2019 Two Color Microarrays EPP 245 Statistical Analysis of Laboratory Data

12/19/2019 Two-Color Arrays Two-color arrays are designed to account for variability in slides and spots by using two samples on each slide, each labeled with a different dye. If a spot is too large, for example, both signals will be too big, and the difference or ratio will eliminate that source of variability November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

12/19/2019 Dyes The most common dye sets are Cy3 (green) and Cy5 (red), which fluoresce at approximately 550 nm and 649 nm respectively (red light ~ 700 nm, green light ~ 550 nm) The dyes are excited with lasers at 532 nm (Cy3 green) and 635 nm (Cy5 red) The emissions are read via filters using a CCD device November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

12/19/2019 November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

12/19/2019 File Format A slide scanned with Axon GenePix produces a file with extension .gpr that contains the results: This contains 29 rows of headers followed by 43 columns of data (in our example files) For full analysis one may also need a .gal file that describes the layout of the arrays November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

"Block" "Column" "Row" "Name" "ID" "X" "Y" "Dia." "F635 Median" "F635 Mean" "F635 SD" "B635 Median" "B635 Mean" "B635 SD" "% > B635+1SD" "% > B635+2SD" "F635 % Sat." "F532 Median" "F532 Mean" "F532 SD" "B532 Median" "B532 Mean" "B532 SD" "% > B532+1SD" "% > B532+2SD" "F532 % Sat." "Ratio of Medians (635/532)" "Ratio of Means (635/532)" "Median of Ratios (635/532)" "Mean of Ratios (635/532)" "Ratios SD (635/532)" "Rgn Ratio (635/532)" "Rgn R² (635/532)" "F Pixels" "B Pixels" "Sum of Medians" "Sum of Means" "Log Ratio (635/532)" "F635 Median - B635" "F532 Median - B532" "F635 Mean - B635" "F532 Mean - B532" "Flags" 12/19/2019 November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

12/19/2019 Analysis Choices Mean or median foreground intensity Background corrected or not Log transform (base 2, e, or 10) or glog transform Log is compatible only with no background correction Glog is best with background correction November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

12/19/2019 Array normalization Array normalization is meant to increase the precision of comparisons by adjusting for variations that cover entire arrays Without normalization, the analysis would be valid, but possibly less sensitive However, a poor normalization method will be worse than none at all. November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

Possible normalization methods
12/19/2019 Possible normalization methods We can equalize the mean or median intensity by adding or multiplying a correction term We can use different normalizations at different intensity levels (intensity-based normalization) for example by lowess or quantiles We can normalize for other things such as print tips November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

12/19/2019 Example for Normalization Group 1 Group 2 Array 1 Array 2 Array 3 Array 4 Gene 1 1100 900 425 550 Gene 2 110 95 85 Gene 3 80 65 55 November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

12/19/2019 > normex <- matrix(c(1100,110,80,900,95,65,425,85,55,550,110,80),ncol=4) > normex [,1] [,2] [,3] [,4] [1,] [2,] [3,] > group <- as.factor(c(1,1,2,2)) > anova(lm(normex[1,] ~ group)) Analysis of Variance Table Response: normex[1, ] Df Sum Sq Mean Sq F value Pr(>F) group * Residuals --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

12/19/2019 > anova(lm(normex[2,] ~ group)) Analysis of Variance Table Response: normex[2, ] Df Sum Sq Mean Sq F value Pr(>F) group Residuals > anova(lm(normex[3,] ~ group)) Response: normex[3, ] November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

12/19/2019 Additive Normalization by Means Group 1 Group 2 Array 1 Array 2 Array 3 Array 4 Gene 1 975 851 541 608 Gene 2 -15 46 201 168 Gene 3 -45 16 171 138 November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

12/19/2019 > cmn <- apply(normex,2,mean) > cmn [1] > mn <- mean(cmn) > normex - rbind(cmn,cmn,cmn)+mn [,1] [,2] [,3] [,4] cmn cmn cmn > normex.1 <- normex - rbind(cmn,cmn,cmn)+mn November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

12/19/2019 > anova(lm(normex.1[1,] ~ group)) Analysis of Variance Table Response: normex.1[1, ] Df Sum Sq Mean Sq F value Pr(>F) group * Residuals > anova(lm(normex.1[2,] ~ group)) Response: normex.1[2, ] Df Sum Sq Mean Sq F value Pr(>F) group * Residuals > anova(lm(normex.1[3,] ~ group)) Response: normex.1[3, ] November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

12/19/2019 Multiplicative Normalization by Means Group 1 Group 2 Array 1 Array 2 Array 3 Array 4 Gene 1 779 776 687 679 Gene 2 78 82 137 136 Gene 3 57 56 89 99 November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

> normex*mn/rbind(cmn,cmn,cmn) [,1] [,2] [,3] [,4] cmn cmn cmn > normex.2 <- normex*mn/rbind(cmn,cmn,cmn) > anova(lm(normex.2[1,] ~ group)) Response: normex.2[1, ] Df Sum Sq Mean Sq F value Pr(>F) group ** Residuals > anova(lm(normex.2[2,] ~ group)) Response: normex.2[2, ] group ** Residuals > anova(lm(normex.2[3,] ~ group)) Response: normex.2[3, ] Df Sum Sq Mean Sq F value Pr(>F) group * Residuals 12/19/2019 November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

12/19/2019 Multiplicative Normalization by Medians Group 1 Group 2 Array 1 Array 2 Array 3 Array 4 Gene 1 1000 947 500 Gene 2 100 Gene 3 73 68 65 November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

> cmd <- apply(normex,2,median) > cmd [1] > normex.3 <- normex*md/rbind(cmd,cmd,cmd) > normex.3 [,1] [,2] [,3] [,4] cmd cmd cmd > anova(lm(normex.3[1,] ~ group)) Response: normex.3[1, ] Df Sum Sq Mean Sq F value Pr(>F) group ** Residuals > anova(lm(normex.3[2,] ~ group)) Response: normex.3[2, ] Df Sum Sq Mean Sq F value Pr(>F) group Residuals > anova(lm(normex.3[3,] ~ group)) Response: normex.3[3, ] group Residuals 12/19/2019 November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

Intensity-based normalization
12/19/2019 Intensity-based normalization Normalize by means, medians, etc., but do so only in groups of genes with similar expression levels. lowess is a procedure that produces a running estimate of the middle, like a robustified mean If we subtract the lowess of each array and add the average of the lowess’s, we get the lowess normalization November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

12/19/2019 norm <- function(mat1) { mat2 <- as.matrix(mat1) p <- dim(mat2)[1] n <- dim(mat2)[2] cmean <- apply(mat2,2,mean) cmean <- cmean - mean(cmean) mnmat <- matrix(rep(cmean,p),byrow=T,ncol=n) return(mat2-mnmat) } November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

12/19/2019 lnorm <- function(mat1,span=.1) { mat2 <- as.matrix(mat1) p <- dim(mat2)[1] n <- dim(mat2)[2] rmeans <- apply(mat2,1,mean) rranks <- rank(rmeans,ties.method="first") matsort <- mat2[order(rranks),] r0 <- 1:p lcol <- function(x) lx <- lowess(r0,x,f=span)$y } lmeans <- apply(matsort,2,lcol) lgrand <- apply(lmeans,1,mean) lgrand <- matrix(rep(lgrand,n),byrow=F,ncol=n) matnorm0 <- matsort-lmeans+lgrand matnorm1 <- matnorm0[rranks,] return(matnorm1) November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

12/19/2019 November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

EPP 245 Statistical Analysis of Laboratory Data

Similar presentations

Presentation on theme: "EPP 245 Statistical Analysis of Laboratory Data"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

EPP 245 Statistical Analysis of Laboratory Data

Similar presentations

Presentation on theme: "EPP 245 Statistical Analysis of Laboratory Data"— Presentation transcript:

Similar presentations

About project

Feedback