Normalization of Microarray Data - how to do it! Henrik Bengtsson Terry Speed

Slides:



Advertisements
Similar presentations
Improved normalisation of microarray data by optimised iterative local regression Matthias E. Futschik Department of Information Science University of.
Advertisements

Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG) Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Statistical analysis of microarray data
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
Microarray Normalization
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Image Quantitation in Microarray Analysis More tomorrow...
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Getting the numbers comparable
Statistics for Microarrays
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Preprocessing Methods for Two-Color Microarray Data
Dilution/Mixture Study Bill Craven, GeneLogic, Inc. Motivated by a desire for a data set to be used as a baseline to characterize analysis and normalization.
Microarray Data Preprocessing and Clustering Analysis
Microarray analysis Golan Yona ( original version by David Lin )
Normalization Class web site: Statistics for Microarrays.
CDNA Microarray Design and Pre-processing By H. Bjørn Nielsen.
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
Gene Expression Data Analyses (2)
Summer at ViaLogy Ronald J. Perez. ViaLogy Developers of computational products for increased performance of molecular detection systems ViaAmp Gene expression.
Normalization of 2 color arrays Alex Sánchez. Dept. Estadística Universitat de Barcelona.
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
Some thoughts of the design of cDNA microarray experiments Terry Speed & Yee HwaYang, Department of Statistics UC Berkeley MGED IV Boston, February 14,
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Corrections and Normalization in microarrays data analysis
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004.
Image Quantitation in Microarray Analysis More tomorrow...
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
CDNA Microarrays MB206.
Panu Somervuo, March 19, cDNA microarrays.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data.
WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003.
Agenda Introduction to microarrays
1 Pre-processing - Normalization Databases Statistics for Microarray Data Analysis – Lecture 2 The Fields Institute for Research in Mathematical Sciences.
Microarray Normalization Issues in High-Throughput Data Analysis BIOS Spring 2010 Dr Mark Reimers.
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Pre-processing in DNA microarray experiments Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor short course Summer 2002.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
(1) Normalization of cDNA microarray data Methods, Vol. 31, no. 4, December 2003 Gordon K. Smyth and Terry Speed.
Pre-processing DNA Microarray Data Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor Short Course Winter 2002 © Copyright.
The second-simplest cDNA microarray data analysis problem Terry Speed, UC Berkeley Bioinformatic Strategies For Application of Genomic Tools to Environmental.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
Statistical Analyses of High Density Oligonucleotide Arrays Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry.
The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
©2003/04 Alessandro Bogliolo Analysis of gene expression by means of Microarrays.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Lecture 2 – Pre-processing and Normalization José Luis Mosquera Computational Lab on Microarrays Data Analysis Special Topics in Computer Science Institute.
CDNA-Project cDNA project Julia Brettschneider (UCB Statistics)
Copyright © 2007 Dan Nettleton
Normalization Methods for Two-Color Microarray Data
New normalisation methods for microarrays
Getting the numbers comparable
Statistics Vocabulary Continued
Normalization for cDNA Microarray Data
Statistics Vocabulary Continued
Pre-processing AFFY data
Presentation transcript:

Normalization of Microarray Data - how to do it! Henrik Bengtsson Terry Speed

Outline The X Data Set (R,G)  (M,A) Transformation Background correction or not? Within slide normalization Across slide normalization Identifying differentially expressed genes The X2 Data Set

The X Data Set All slides are replicates and contains 5184 spots/genes. Three identical RNA preparations were done; (a) was hybridized to slide 1-3, (b) to slide 4-6, and (c) to slide 7-9. All data is collected by GenePix TM Scanner and Software. The following analysis was done using [R] and the sma library by Terry Speed Group. SlideTitleName 1Mutant (a) vs. Reference (a)dUDG558 2Mutant (a) vs. Reference (a)dUDG409 3Mutant (a) vs. Reference (a)dUDG405 4Mutant (b) vs. Reference (b)dUDG411 5Mutant (b) vs. Reference (b)dUDG412 6Mutant (b) vs. Reference (b)dUDG414 7Mutant (c) vs. Reference (c)dUDG413 8Mutant (c) vs. Reference (c)dUDG415 9Mutant (c) vs. Reference (c)dUDG813

(R,G)  (M,A) Transformation “Observed” data {(R,G)} n= : R = red channel signal G = green channel signal (background corrected or not) Transformed data {(M,A)} n= : M = log 2 (R/G) (ratio), A = log 2 (R·G) 1/2 = 1/2·log 2 (R·G) (intensity)  R=(2 2A+M ) 1/2, G=(2 2A-M ) 1/2

Background correction or not? Decision 1: No background correction

Within Slide Normalization Question: What kind of normalization should be applied: 1.No normalization, or 2.Global (lowess) normalization, or 3.Print-tip normalization, or 4.Scaled print-tip normalization?

No Normalization Non-normalized data {(M,A)} n= : M = log 2 (R/G)

Global (lowess) Normalization Global normalized data {(M,A)} n= : M norm = M-c(A) where c(A) is an intensity dependent function.

Print-tip Normalization Print-tip normalized data {(M,A)} n= : M p,norm = M p -c p (A);p=print tip (1-16) where c p (A) is an intensity dependent function for print tip p Print-tip layout

Scaled Print-tip Normalization Scaled print-tip normalized data {(M,A)} n= : M p,norm = s p ·(M p -c p (A));p=print tip (1-16) where s p is a scale factor for print tip p (Median Absolute Deviation). After print-tip normalizationAfter scaled print-tip normalization

Spatial Effects No normalizationGlobal normalization Print-tip normalization Scaled Print-tip normalization

Another Quick Example Scaled print-tip normalization:

Within Slide Normalization Summary Question: What kind of normalization should be applied: 1.No normalization, or 2.Global (lowess) normalization, or 3.Print-tip normalization, or 4.Scaled print-tip normalization? Decision 2: Scaled print-tip normalization.

Across Slides Normalization Scaled print-tip normalization Median Absolute Deviation (MAD) Scaling Averaging

Average Over All Slides The “average” slide:

Cutoff by M values Top 5% of the absolute M values (|M| > 0.56):

Cutoff by T values Top 5% of the absolute T values (|T|>8.6) s.t. SE(M) > 0.03:

SE Cutoff Level In this data set, the number of genes found is insensitive to the SE cutoff level. About 1000 of the genes with smallest SE can be cutoff before it affects the final results.

103 Differentially Expressed Genes Top 5% of the absolute T values (|T|>8.6) s.t. SE(M) > 0.03, and top 5% of the absolute M values (|M|>0.56):

Location of Differentially Expressed Genes Location of the 4x4 grid sized microarray

25 Differentially Expressed Genes Top 2% of the absolute T values (|T|>11) s.t. SE(M) > 0.03 and top 2% of the absolute M values (|M|>0.9): Gene:M avg A avg TSE

The X2 Data Set All slides are replicates and contains 5184 spots/genes. Three identical RNA preparations were done; (a) was hybridized to slide 1 & 2, (b) to slide 3 & 4, and (c) to slide 5 & 6. SlideTitleName 1Mutant (a) vs. Reference (a)dUDG816 2Mutant (a) vs. Reference (a)dUDG817 3Mutant (b) vs. Reference (b)dUDG818 4Mutant (b) vs. Reference (b)dUDG820 5Mutant (c) vs. Reference (c)dUDG821 6Mutant (c) vs. Reference (c)dUDG822

93 Differentially Expressed Genes Top 5% of the absolute T values (|T|>5.6) s.t. SE(M) > 0.03) and top 5% of the absolute M values (|M|>0.38):

Top 2% of the absolute T values (|T|>7.1) s.t. SE(M) > 0.03 and top 2% of the absolute M values (|M|>0.53): 25 Differentially Expressed Genes Gene:M avg A avg TSE

Acknowledgement Thanks to: Jean Yee Hwa Yang [R] Software (free): The Statistical Microarray Analysis (sma) library (free):