SPH 247 Statistical Analysis of Laboratory Data. Two-Color Arrays Two-color arrays are designed to account for variability in slides and spots by using.

Slides:



Advertisements
Similar presentations
SPH 247 Statistical Analysis of Laboratory Data 1April 2, 2013SPH 247 Statistical Analysis of Laboratory Data.
Advertisements

M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
ANOVA: Analysis of Variation
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
MicroArray Image Analysis
MicroArray Image Analysis Robin Liechti
Microarray Normalization
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Normalization of Microarray Data - how to do it! Henrik Bengtsson Terry Speed
TIGR Spotfinder: a tool for microarray image processing
Getting the numbers comparable
1 Analysis of Affymetrix GeneChip Data EPP 245/298 Statistical Analysis of Laboratory Data.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Preprocessing Methods for Two-Color Microarray Data
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
Gene Expression Data Analyses (2)
Normalization of 2 color arrays Alex Sánchez. Dept. Estadística Universitat de Barcelona.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Statistical Analysis of Microarray Data
Scanning and image analysis Scanning -Dyes -Confocal scanner -CCD scanner Image File Formats Image analysis -Locating the spots -Segmentation -Evaluating.
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
Scanning and Image Processing -by Steve Clough. GSI Lumonics cDNA microarrays use two dyes with well separated emission spectra such as Cy3 and Cy5 to.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
Hybridization and data acquisition –Hybridization –Scanning –Image analysis –Background correction and filtering –Data transformation Methods for normalization.
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
CDNA Microarrays MB206.
Panu Somervuo, March 19, cDNA microarrays.
1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Microarray - Leukemia vs. normal GeneChip System.
The Analysis of Microarray data using Mixed Models David Baird Peter Johnstone & Theresa Wilson AgResearch.
Regression Model Building LPGA Golf Performance
SPH 247 Statistical Analysis of Laboratory Data April 23, 2013.
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
ImArray - An Automated High-Performance Microarray Scanner Software for Microarray Image Analysis, Data Management and Knowledge Mining Wei-Bang Chen and.
Exercise 1 You have a clinical study in which 10 patients will either get the standard treatment or a new treatment Randomize which 5 of the 10 get the.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Statistics for Differential Expression Naomi Altman Oct. 06.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
MICROARRAYS D’EXPRESSIÓ ESTUDI DE REGULADORS DE LA TRANSCRIPCIÓ DE LA FAMILIA trxG M. Corominas:
(1) Normalization of cDNA microarray data Methods, Vol. 31, no. 4, December 2003 Gordon K. Smyth and Terry Speed.
SPH 247 Statistical Analysis of Laboratory Data 1 May 5, 2015 SPH 247 Statistical Analysis of Laboratory Data.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
EPP 245 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
SPH 247 Statistical Analysis of Laboratory Data. Fitting a model to genes We can fit a model to the data of each gene after the whole arrays have been.
Other uses of DNA microarrays
1 Analysis of Variance (ANOVA) EPP 245/298 Statistical Analysis of Laboratory Data.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Lecture 2 – Pre-processing and Normalization José Luis Mosquera Computational Lab on Microarrays Data Analysis Special Topics in Computer Science Institute.
ANOVA: Analysis of Variation
ANOVA: Analysis of Variation
Normalization Methods for Two-Color Microarray Data
Gene Expression Arrays
Normalization for cDNA Microarray Data
Exercise 1 Use Transform  Compute variable to calculate weight lost by each person Calculate the overall mean weight lost Calculate the means and standard.
Pre-processing AFFY data
Presentation transcript:

SPH 247 Statistical Analysis of Laboratory Data

Two-Color Arrays Two-color arrays are designed to account for variability in slides and spots by using two samples on each slide, each labeled with a different dye. If a spot is too large, for example, both signals will be too big, and the difference or ratio will eliminate that source of variability May 14, 2010SPH 247 Statistical Analysis of Laboratory Data2

Dyes The most common dye sets are Cy3 (green) and Cy5 (red), which fluoresce at approximately 550 nm and 649 nm respectively (red light ~ 700 nm, green light ~ 550 nm) The dyes are excited with lasers at 532 nm (Cy3 green) and 635 nm (Cy5 red) The emissions are read via filters using a CCD device May 14, 2010SPH 247 Statistical Analysis of Laboratory Data3

May 14, 2010SPH 247 Statistical Analysis of Laboratory Data4

May 14, 2010SPH 247 Statistical Analysis of Laboratory Data5

May 14, 2010SPH 247 Statistical Analysis of Laboratory Data6

File Format A slide scanned with Axon GenePix produces a file with extension.gpr that contains the results: This contains 29 rows of headers followed by 43 columns of data (in our example files) For full analysis one may also need a.gal file that describes the layout of the arrays May 14, 2010SPH 247 Statistical Analysis of Laboratory Data7

May 14, 2010SPH 247 Statistical Analysis of Laboratory Data8 "Block" "Column" "Row" "Name" "ID" "X" "Y" "Dia." "F635 Median" "F635 Mean" "F635 SD" "B635 Median" "B635 Mean" "B635 SD" "% > B635+1SD" "% > B635+2SD" "F635 % Sat." "F532 Median" "F532 Mean" "F532 SD" "B532 Median" "B532 Mean" "B532 SD" "% > B532+1SD" "% > B532+2SD" "F532 % Sat." "Ratio of Medians (635/532)" "Ratio of Means (635/532)" "Median of Ratios (635/532)" "Mean of Ratios (635/532)" "Ratios SD (635/532)" "Rgn Ratio (635/532)" "Rgn R² (635/532)" "F Pixels" "B Pixels" "Sum of Medians" "Sum of Means" "Log Ratio (635/532)" "F635 Median - B635" "F532 Median - B532" "F635 Mean - B635" "F532 Mean - B532" "Flags"

Analysis Choices Mean or median foreground intensity Background corrected or not Log transform (base 2, e, or 10) or glog transform Log is compatible only with no background correction Glog is best with background correction May 14, 2010SPH 247 Statistical Analysis of Laboratory Data9

Array normalization Array normalization is meant to increase the precision of comparisons by adjusting for variations that cover entire arrays Without normalization, the analysis would be valid, but possibly less sensitive However, a poor normalization method will be worse than none at all. May 14, 2010SPH 247 Statistical Analysis of Laboratory Data10

Possible normalization methods We can equalize the mean or median intensity by adding or multiplying a correction term We can use different normalizations at different intensity levels (intensity-based normalization) for example by lowess or quantiles We can normalize for other things such as print tips May 14, 2010SPH 247 Statistical Analysis of Laboratory Data11

May 14, 2010SPH 247 Statistical Analysis of Laboratory Data12 Group 1Group 2 Array 1Array 2Array 3Array 4 Gene Gene Gene Example for Normalization

May 14, 2010SPH 247 Statistical Analysis of Laboratory Data13 > normex <- matrix(c(1100,110,80,900,95,65,425,85,55,550,110,80),ncol=4) > normex [,1] [,2] [,3] [,4] [1,] [2,] [3,] > group <- as.factor(c(1,1,2,2)) > anova(lm(normex[1,] ~ group)) Analysis of Variance Table Response: normex[1, ] Df Sum Sq Mean Sq F value Pr(>F) group * Residuals Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

May 14, 2010SPH 247 Statistical Analysis of Laboratory Data14 > anova(lm(normex[2,] ~ group)) Analysis of Variance Table Response: normex[2, ] Df Sum Sq Mean Sq F value Pr(>F) group Residuals > anova(lm(normex[3,] ~ group)) Analysis of Variance Table Response: normex[3, ] Df Sum Sq Mean Sq F value Pr(>F) group Residuals

May 14, 2010SPH 247 Statistical Analysis of Laboratory Data15 Group 1Group 2 Array 1Array 2Array 3Array 4 Gene Gene Gene Additive Normalization by Means

May 14, 2010SPH 247 Statistical Analysis of Laboratory Data16 > cmn <- apply(normex,2,mean) > cmn [1] > mn <- mean(cmn) > normex - rbind(cmn,cmn,cmn)+mn [,1] [,2] [,3] [,4] cmn cmn cmn > normex.1 <- normex - rbind(cmn,cmn,cmn)+mn

May 14, 2010SPH 247 Statistical Analysis of Laboratory Data17 > anova(lm(normex.1[1,] ~ group)) Analysis of Variance Table Response: normex.1[1, ] Df Sum Sq Mean Sq F value Pr(>F) group * Residuals > anova(lm(normex.1[2,] ~ group)) Analysis of Variance Table Response: normex.1[2, ] Df Sum Sq Mean Sq F value Pr(>F) group * Residuals > anova(lm(normex.1[3,] ~ group)) Analysis of Variance Table Response: normex.1[3, ] Df Sum Sq Mean Sq F value Pr(>F) group * Residuals

May 14, 2010SPH 247 Statistical Analysis of Laboratory Data18 Group 1Group 2 Array 1Array 2Array 3Array 4 Gene Gene Gene Multiplicative Normalization by Means

May 14, 2010SPH 247 Statistical Analysis of Laboratory Data19 > normex*mn/rbind(cmn,cmn,cmn) [,1] [,2] [,3] [,4] cmn cmn cmn > normex.2 <- normex*mn/rbind(cmn,cmn,cmn) > anova(lm(normex.2[1,] ~ group)) Response: normex.2[1, ] Df Sum Sq Mean Sq F value Pr(>F) group ** Residuals > anova(lm(normex.2[2,] ~ group)) Response: normex.2[2, ] Df Sum Sq Mean Sq F value Pr(>F) group ** Residuals > anova(lm(normex.2[3,] ~ group)) Response: normex.2[3, ] Df Sum Sq Mean Sq F value Pr(>F) group * Residuals

May 14, 2010SPH 247 Statistical Analysis of Laboratory Data20 Group 1Group 2 Array 1Array 2Array 3Array 4 Gene Gene 2100 Gene Multiplicative Normalization by Medians

May 14, 2010SPH 247 Statistical Analysis of Laboratory Data21 > cmd <- apply(normex,2,median) > cmd [1] > normex.3 <- normex*md/rbind(cmd,cmd,cmd) > normex.3 [,1] [,2] [,3] [,4] cmd cmd cmd > anova(lm(normex.3[1,] ~ group)) Response: normex.3[1, ] Df Sum Sq Mean Sq F value Pr(>F) group ** Residuals > anova(lm(normex.3[2,] ~ group)) Response: normex.3[2, ] Df Sum Sq Mean Sq F value Pr(>F) group Residuals > anova(lm(normex.3[3,] ~ group)) Response: normex.3[3, ] Df Sum Sq Mean Sq F value Pr(>F) group Residuals

Intensity-based normalization Normalize by means, medians, etc., but do so only in groups of genes with similar expression levels. lowess is a procedure that produces a running estimate of the middle, like a robustified mean If we subtract the lowess of each array and add the average of the lowess’s, we get the lowess normalization May 14, 2010SPH 247 Statistical Analysis of Laboratory Data22

May 14, 2010SPH 247 Statistical Analysis of Laboratory Data23 norm <- function(mat1) { mat2 <- as.matrix(mat1) p <- dim(mat2)[1] n <- dim(mat2)[2] cmean <- apply(mat2,2,mean) cmean <- cmean - mean(cmean) mnmat <- matrix(rep(cmean,p),byrow=T,ncol=n) return(mat2-mnmat) }

May 14, 2010SPH 247 Statistical Analysis of Laboratory Data24 lnorm <- function(mat1,span=.1) { mat2 <- as.matrix(mat1) p <- dim(mat2)[1] n <- dim(mat2)[2] rmeans <- apply(mat2,1,mean) rranks <- rank(rmeans,ties.method="first") matsort <- mat2[order(rranks),] r0 <- 1:p lcol <- function(x) { lx <- lowess(r0,x,f=span)$y } lmeans <- apply(matsort,2,lcol) lgrand <- apply(lmeans,1,mean) lgrand <- matrix(rep(lgrand,n),byrow=F,ncol=n) matnorm0 <- matsort-lmeans+lgrand matnorm1 <- matnorm0[rranks,] return(matnorm1) }

May 14, 2010SPH 247 Statistical Analysis of Laboratory Data25

May 14, 2010SPH 247 Statistical Analysis of Laboratory Data26

May 14, 2010SPH 247 Statistical Analysis of Laboratory Data27