Gene Expression Data Analyses (2)

Slides:



Advertisements
Similar presentations
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
Advertisements

Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.
1 Introduction to Experimental Design 1/26/2009 Copyright © 2009 Dan Nettleton.
Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS Spring 2010 Dr Mark Reimers.
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
MicroArray Image Analysis
MicroArray Image Analysis Robin Liechti
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
Normalization of microarray data
Microarray technology and analysis of gene expression data Hillevi Lindroos.
TIGR Spotfinder: a tool for microarray image processing
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Getting the numbers comparable
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Preprocessing Methods for Two-Color Microarray Data
Microarray Data Preprocessing and Clustering Analysis
Normalization Class web site: Statistics for Microarrays.
Normalization of 2 color arrays Alex Sánchez. Dept. Estadística Universitat de Barcelona.
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
Making Sense of Complicated Microarray Data
A robust neural networks approach for spatial and intensity-dependent normalization of cDNA microarray data A.L. Tarca, J.E.K. Cooke and J. MacKay Presented.
Statistical Analysis of Microarray Data
Scanning and image analysis Scanning -Dyes -Confocal scanner -CCD scanner Image File Formats Image analysis -Locating the spots -Segmentation -Evaluating.
Gene Expression Data Analyses (1) Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Analysis of microarray data
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
Image Quantitation in Microarray Analysis More tomorrow...
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
(2) Ratio statistics of gene expression levels and applications to microarray data analysis Bioinformatics, Vol. 18, no. 9, 2002 Yidong Chen, Vishnu Kamat,
Affymetrix vs. glass slide based arrays
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
Copyright 2000, Media Cybernetics, L.P. Array-Pro ® Analyzer Software.
Hybridization and data acquisition –Hybridization –Scanning –Image analysis –Background correction and filtering –Data transformation Methods for normalization.
IMAGE INFORMATICS SOLUTIONS Extracting Information From Images Array-Pro 4.5 Training, May 2003.
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
CDNA Microarrays MB206.
Panu Somervuo, March 19, cDNA microarrays.
1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
We calculated a t-test for 30,000 genes at once How do we handle results, present data and results Normalization of the data as a mean of removing.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
The Analysis of Microarray data using Mixed Models David Baird Peter Johnstone & Theresa Wilson AgResearch.
1 Pre-processing - Normalization Databases Statistics for Microarray Data Analysis – Lecture 2 The Fields Institute for Research in Mathematical Sciences.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Statistics for Differential Expression Naomi Altman Oct. 06.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
(1) Normalization of cDNA microarray data Methods, Vol. 31, no. 4, December 2003 Gordon K. Smyth and Terry Speed.
Microarray Data Analysis The Bioinformatics side of the bench.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
©2003/04 Alessandro Bogliolo Analysis of gene expression by means of Microarrays.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
DNA Microarray. Microarray Printing 96-well-plate (PCR Products) 384-well print-plate Microarray.
Normalization Methods for Two-Color Microarray Data
Getting the numbers comparable
Introduction to Experimental Design
Normalization for cDNA Microarray Data
Presentation transcript:

Gene Expression Data Analyses (2) Trupti Joshi Computer Science Department 317 Engineering Building North E-mail: joshitr@missouri.edu 573-884-3528(O)

Recap (Lecture 1) RNA is first isolated from different tissues, developmental stages, disease states or samples subjected to appropriate treatments. RNA is then labeled and hybridized to the arrays using an experimental strategy that allows expression to be assayed and compared between appropriate sample pairs. Use a single label and independent arrays for each sample, or a single array with distinguishable fluorescent dye labels for the individual RNAs. Regardless of the approach chosen, the arrays are scanned after hybridization and independent grayscale images, typically 16-bit TIFF images, are generated for each pair of samples to be compared. Images are then analyzed to identify the arrayed spots and to measure the relative fluorescence intensities for each element.

Lecture Outline Image analysis Data representation Data Normalization Normalization within slides Scaled normalization Linear regression normalization Lowess Normalization Global vs. Local normalization Variance regularization Replicate Filtering Normalization between slides

Lecture Outline Image analysis Data representation Data Normalization Normalization within slides Scaled normalization Linear regression normalization Lowess Normalization Global vs. Local normalization Variance regularization Replicate Filtering Normalization between slides

Spotted Array Cy5 Cy3

Quality of Images Common problems: Spot is not regular (e.g. not round, donut shape) Hybridization is not even (e.g. half is good) Hybridization with fog The hybridization is too weak or saturated

Image Processing Gridding Segmentation Processing techniques Identifying spot locations Segmentation Identifying foreground and background Processing techniques Manual vs. semiautomatic gridding Variety of segmentation techniques

Segmentation

Data Quality (1) Irregular size or shape Irregular placement Low intensity Saturation Spot variance Background variance miss alignment artifact indistinguishable saturated bad print

Data Quality (2) Calculate numeric characteristics of each spot Throw out spots that do not meet minimum requirements for each characteristic Throw out spots that do not have minimum overall combined quality

Tips for Image Scan Image format: 16 bit TIFF (0-65,536 intensity values) Color: Rainbow palette data display for easy viewing Adjust scanning resolution:  5, 10, 20 and 50 µm Adjust the saturation rates (not many red spots)

Signal Extraction Many softwares are available (Imagene, GPC VisualGrid, TIGR SpotFinder, etc) Most of them are effective

Tips for Signal Extraction Signal/noise ratio>+1.96 Background area selection Spot finding automation Batch processing ability might not be good Bad spots should be removed

Lecture Outline Image analysis Data representation Data Normalization Normalization within slides Scaled normalization Linear regression normalization Lowess Normalization Global vs. Local normalization Variance regularization Replicate Filtering Normalization between slides

Expression Ratio Consider an array that has Narray distinct elements, and compare a query (R) and a reference sample (G), (for the red and green colors commonly used to represent array data), then the ratio (T) for the ith gene (where i is an index running over all the arrayed genes from 1 to Narray): Usually use log2(Ti) Reflect the up-regulated and down-regulated genes

Log Transformations Logarithm base 2 transformation, has the advantage of producing a continuous spectrum of values and treating up and down regulated genes in a similar fashion. The logarithms of the expression ratios are also treated symmetrically, such that genes up regulated by a factor of 2 has a log2(ratio) of 1, gene down regulated by a factor of 2 has a log2(ratio) of −1, gene expressed at a constant level (ratio of 1) has a log2(ratio) equal to zero.

Example Gene 1 2 3 4 5 R: Cy3: 0.1, 0.6, 0.3, 0.3, 0.5 G: Cy5: 0.2, 0.3, 0.6, 0.2, 0.5 Thus Gene 1: log2(0.1/0.2) = -1 Gene 2: log2(0.6/0.3) = 1 ….. Gene 4: log2(0.3/0.2) = 0.58 …

Lecture Outline Image analysis Data representation Data Normalization Normalization within slides Scaled normalization Linear regression normalization Lowess Normalization Global vs. Local normalization Variance regularization Replicate Filtering Normalization between slides

Data Normalization Uncalibrated, red light under detected Calibrated, red and green equally detected

Rational for Data Normalization Unequal quantities of starting RNA Differences in labeling Differences in detecting efficiencies between the fluorescent dyes Scanning saturation Systematic biases in the measured expression levels

Two normalization Normalization within slides Normalization between slides

Normalization Benefits Can control for many of the experimental sources of variability (systematic, not random or gene specific) Bring each image to the same average brightness

Assumptions for Data Normalization The average mass of each molecule is approximately the same, thus the molecule number in each sample will be the same The arrayed elements represent a random sampling of the genes in the organism The number of molecules from each sample to hybridize array are similar thus the total intensity for each sample will be the same

Lecture Outline Image analysis Data representation Data Normalization Normalization within slides Scaled normalization Linear regression normalization Lowess Normalization Global vs. Local normalization Variance regularization Replicate Filtering Normalization between slides

Data Normalization Methods Scaled Normalization By total intensity By mean By median By a group of genes Linear regression analysis Lowess normalization Log centering Rank invariant methods Chen’s ratio statistics

Scaled Normalization by Total Intensity Gi and Ri are the measured intensities for the ith array element Log2(Ti’) is the normalized value

Example Gene 1 2 3 4 5 R: Cy3: 0.1, 0.2, 0.3, 0.3, 0.5 G: Cy5: 0.2, 0.5, 0.6, 0.2, 0.5 Ntotal = (0.1+0.2+0.3+0.3+0.5)/(0.2+0.5+0.6+0.2+0.5) =1.4/2 =0.7 Thus gene 1: log2(0.5)-log2(0.7) …

Other Scaled Normalization Substitute the Ntotal by Nmean, Nmedian For the normalization for a subset of genes, use the values generated from a subset of genes instead of all genes during the transformation

Regression Normalization Fit the linear regression model: Assumption: all the genes on the array have the same variance (homogeneity) Test the significance of the intercept . Fit a linear regression without  if it is insignificant. Transform the treatment data: Problem: assumption may not hold nonlinear trend (the third replicates of RL95 data has a slight quadratic trend) .

Scatter Plot of Log Intensity before vs. after Regression Normalization

Problem for Above Normalization Only take care of the intensities between channel Do not take into account systematic bias that may appear within the data The log2(ratio) values can have a systematic dependence on intensity most commonly a deviation from zero for low-intensity spots.

Systematic Intensity-dependent Effects of log2(ratio) Examples: Under-expressed genes appear up-regulated in the red channel. Moderately expressed genes appear up-regulated in the green channel. Explanation: Chemical dyes don’t fluoresce equally at different levels because of different levels of quenching (a phenomenon where dye molecules in close proximity, re-absorb light from each other, thus diminishing the signal) Solution: Easiest way to visualize intensity-dependent effects is to plot the measured log2(Ri/Gi) for each element on the array as a function of the log2(Ri*Gi) product intensities. Such 'R-I' (for ratio-intensity) plot can reveal intensity-specific artifacts in the log2(ratio) measurements.

R-I Plots

Lowess Normalization Lowess (Locally weighted linear regression) analysis It may remove the intensity-dependent effects in the log2(ratio) values

How to do Lowess Normalization Normalize the value point by point Generally require defined percent for local area (e.g. 20%) Lowess normalization requires a ratio (two dyes experiments only)

Effects of Lowess Normalization on R-I Plot

Globe vs Local Normalization The pin may generate some bias: one region has a larger spots. Problem: May cause variance of one region to be different from that of another region

Variance Regularization Assume that each subgrid has M elements, (with mean of the log2(ratio) values in each subgrid already adjusted to zero), then variance in the nth subgrid is If the number of subgrids in the array is Ngrids, then the appropriate scaling factor for the elements of the kth subgrid is Scaling all of the elements within the kth subgrid by dividing by the same value ak computed for that subgrid

Replicate Filtering Technical replication in two-color spotted array analysis (dye-reversal or flip-dye analysis), consists of duplicating labeling and hybridization by swapping the fluorescent dyes used for each RNA sample. May help to compensate for any biases that may occur during labeling or hybridization; for example, if some genes preferentially label with the red or green dye.

Replicate Filtering Outliers excluded

Lecture Outline Image analysis Data representation Data Normalization Normalization within slides Scaled normalization Linear regression normalization Lowess Normalization Global vs. Local normalization Variance regularization Replicate Filtering Normalization between slides

Normalization between slides Use scaled normalization Generally preferred medium for normalization

Lecture Outline Image analysis Data representation Data Normalization Normalization within slides Scaled normalization Linear regression normalization Lowess Normalization Global vs. Local normalization Variance regularization Replicate Filtering Normalization between slides

Reading Assignments Suggested reading: Quackenbush J. Microarray data normalization and transformation. 2002. Nature Genetics, 32: 496-501. Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP. 2002. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 30: e15.