The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.

Slides:



Advertisements
Similar presentations
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Advertisements

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
OHRI Bioinformatics Introduction to the Significance Analysis of Microarrays application Stem.
TIGR Spotfinder: a tool for microarray image processing
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Getting the numbers comparable
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Microarray Data Preprocessing and Clustering Analysis
Gene Expression Data Analyses (3)
Differentially expressed genes
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Statistical Analysis of Microarray Data
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Gene Expression Data Analyses (2)
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
GCB/CIS 535 Microarray Topics John Tobias November 8th, 2004.
1 Test of significance for small samples Javier Cabrera.
Making Sense of Complicated Microarray Data
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
Inference for regression - Simple linear regression
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
Copyright 2000, Media Cybernetics, L.P. Array-Pro ® Analyzer Software.
5-1 Introduction 5-2 Inference on the Means of Two Populations, Variances Known Assumptions.
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
Analysis of Molecular and Clinical Data at PolyomX Adrian Driga 1, Kathryn Graham 1, 2, Sambasivarao Damaraju 1, 2, Jennifer Listgarten 3, Russ Greiner.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
CDNA Microarrays MB206.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Panu Somervuo, March 19, cDNA microarrays.
Significance analysis of microarrays (SAM) SAM can be used to pick out significant genes based on differential expression between sets of samples. Currently.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
A Short Overview of Microarrays Tex Thompson Spring 2005.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Statistics for Differential Expression Naomi Altman Oct. 06.
Review - Confidence Interval Most variables used in social science research (e.g., age, officer cynicism) are normally distributed, meaning that their.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Analyzing Expression Data: Clustering and Stats Chapter 16.
1 Significance analysis of Microarrays (SAM) Applied to the ionizing radiation response Tusher, Tibshirani, Chu (2001) Dafna Shahaf.
Microarray Data Analysis The Bioinformatics side of the bench.
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Other uses of DNA microarrays
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Significance analysis of microarrays (SAM)
Significance Analysis of Microarrays (SAM)
Significance Analysis of Microarrays (SAM)
Getting the numbers comparable
Normalization for cDNA Microarray Data
Presentation transcript:

The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis (Nov , PICB Shanghai) by Peter Serocka

MIcroarray Data Analysis System (version 2.19 ) Wei Liang October 2004

Microarray Data Flow Image Analysis Database AGED Database Others… Database MAD Raw Gene Expression Data Normalized Data with Gene Annotation Interpretation of Analysis Results.tiff Image File Gene Annotation ScannerPrinter Normalization / Filtering Expression Analysis Data Entry / Management

MIDAS is a Normalization and Filtering tool for microarray data analysis!

Serves as a data pre-processor for clustering analysis (MeV).

Why Normalization and Filtering? Cy3 Cy5 Cy5-cDNA Cy3-cDNA RT cDNA array Cy5 intensity Cy3 intensity Sample2 mRNA Sample1 mRNA Wavelength dependent Intensity dependent Uneven hybridization gel print-tip variations Background variations Image processing algorithm- dependent Systematic experimental error.tiff Image Files Raw Data File

Why Normalization and Filtering? We use these intensities to identify biologically relevant patterns of expression by comparing measured levels between states on a gene-by-gene basis. However, before the levels can be appropriately compared, one generally performs a number of transformations on the data to eliminate questionable or low quality data, to adjust the measured intensities to facilitate comparisons, and to select those genes that are significantly differentially expressed. The hypothesis underlying microarray analysis is that the measured intensities for each arrayed gene represent its relative expression level.

MIDAS data analysis methods 8 normalization/transformation methods Total Intensity normalization 10 quality control filtering methods Invalid-intensity checking LOWESS (Locfit) normalization Iterative linear regression normalization Iterative log mean centering normalization Ratio Statistics normalization Low intensity filter Standard deviation regularization Slice analysis (non-statistical) In-slide replicates analysis Flip-dye consistency checking Ratio Statistics confidence interval checking Signal/Noise checking Cross-file-trim Spot QC flag checking MA-ANOVA Cross-slide replicates t-test (statistical) Cross-slide one-class SAM (statistical) 3 significant genes identification methods

Graphical scripting language

Read input files Define analysis pipeline and set parameters for each analysis module Write output files

MIDAS data analysis methods 8 normalization/transformation methods Total Intensity normalization 10 quality control filtering methods Invalid-intensity checking LOWESS (Locfit) normalization Iterative linear regression normalization Iterative log mean centering normalization Ratio Statistics normalization Low intensity filter Standard deviation regularization Slice analysis (non-statistical) In-slide replicates analysis Flip-dye consistency checking Ratio Statistics confidence interval checking Signal/Noise checking Cross-file-trim Spot QC flag checking MA-ANOVA Cross-slide replicates t-test (statistical) Cross-slide one-class SAM (statistical) 3 significant genes identification methods

Sample data Pair #1 st file name2 nd file name 1NFE005d0001.mevNFE005d00020.mev 2NFE005d0002.mevNFE005d00021.mev 3NFE005d0003.mevNFE005d00022.mev 4NFE005d0004.mevNFE005d00023.mev 5NFE005d0005.mevNFE005d00024.mev 6NFE005d0006.mevNFE005d00025.mev 7NFE005d0007.mevNFE005d00026.mev 9NFE005d0008.mevNFE005d00027.mev 10NFE005d0009.mevNFE005d00028.mev 11NFE005d00010.mevNFE005d00029.mev 12NFE005d00011.mevNFE005d00030.mev 13NFE005d00012.mevNFE005d00031.mev 14NFE005d00013.mevNFE005d00032.mev 15NFE005d00014.mevNFE005d00033.mev 16NFE005d00015.mevNFE005d00034.mev 17NFE005d00016.mevNFE005d00035.mev 18NFE005d00017.mevNFE005d00036.mev 19NFE005d00018.mevNFE005d00037.mev 20NFE005d00019.mevNFE005d00038.mev

LOWESS (Locfit) normalization ASD = Observations 1.Tilted tails at low intensity end and high intensity end 2. Mean not centered at 0 – intensity dependent R-I plot: logRatio vs. logIntensityProduct

LOWESS (Locfit) normalization ASD = Gene X If Cy3, Cy5 equally expressed, log 2 (Cy5/Cy3) = 0 Two factors contributed to the up-regulated gene X: 1. Biological factors (we are interested) 2. Experimental factors, e.g. different sensitivity to red and green lasers (we are NOT interested and desire to get rid of.) Exp factor Bio factor

ASD = Gene X Exp factor Bio factor We need to find a way to extract the experimental factors Approach: Assume similar experimental factors applied to genes closer to each other in the logProd-logRatio plot Predict the Exp factor from a group of locally neighboring data --- equivalent to a curve fitting problem. LOWESS (Locfit) normalization

Local linear regression model Tri-cube weight function Least Squares Estimated values of log 2 (Cy5/Cy3) as function of log 10 (Cy3*Cy5) ASD = 0.346

LOWESS (Locfit) normalization Use the estimated curve y(x i ) to correct raw data ASD = Gene X y(x i ) = Exp factor Bio factor log 2 (R i ’/G i ’) = log 2 (R i /G i ) – y(x i ) log 2 (R i ’/G i ’) = log 2 (R i /G i ) – log 2 2 y(xi) log 2 (R i ’/G i ’) = log2(R i /G i * 1/2y(x i )) R i ’ = R i G i ’ = G i * 2 y(xi)

LOWESS (Locfit) normalization SD = SD = B LOWESS-corrected RI plot

Standard deviation regularization Assumption: Within each block and each slide, spots should have the same spread for log(Cy5/Cy3, 2) values SD-Reg scales the (Cy3, Cy5) intensity pair for each spot so that the spot sets within each block or each slide will have the same standard deviation as other blocks or slides.

Standard deviation regularization Let a ij be the raw log ratio for the j th spot in i th block (or slide) where N j denotes the number of genes i th block or i th slide, M denotes the number of blocks or slides, a ij denotes the log ratio mean of i th block (or i th slide) a’ ij be the scaled log ratio for the j th spot in i th block (or slide)

Standard deviation regularization

Flip dye replicates consistency filter The intensities in the file pair are flipped, i.e. R1/G1 ~ G2/R2 or R1~ G2, G1 ~ R2 G1R1 G2R2 Gene1 Gene2 Gene3 Gene4 Gene8 Gene7 Gene6 Gene5 Flip dye experiments help reduce random error

Flip dye replicates consistency filter Calculate expression levels for all genes in the flip-dye pair Filter genes with inconsistent expression levels between flip-dye replicates For those genes passed the consistency checking, take geometric mean for the corresponding intensities from the replicated pairs How consistency is measured between replicates?

Flip dye replicates consistency filter File 1 File 2 G1R1G2R2 Gene 100% consistency:

Flip dye replicates consistency Filter SD cut vs. Threshold cut SD cut Threshold cut Regardless of datasets, always cut the same percentage for the same  The percentage to cut depends on the specified log-ratio consistency range -1< < 1 1/2 < < 2

Flip dye replicates consistency filter Calculate expression levels for all genes in the flip-dye pair Filter genes with inconsistent expression levels between flip-dye replicates For those genes passed the consistency checking, take geometric mean for the corresponding intensities from the replicated pairs

Slice Analysis filter Remove genes with z-scores beyond an interested range

Slice Analysis filter Remove genes with z-scores beyond an interested range

Slice Analysis filter SD = SD = B Define a slice window Sliding the window along the log(IntensityProduct) axis Calculate logRatioMean and logRatioSD of data points within each slice window Calculate Z-scores of each data point Z-score = (logRatio-logRatioMean)/ logRatioSD Trim data with Z-scores beyond interested range

Slice Analysis filter

Analysis packaging myAnalysis.prj

MIDAS graphing

R-I plot (.prc) Box plot (.box) FlipDye Diagnostic plot (.rrc)Intensity plot (.ity,.lty) Z-score Distribution plot (.his)SAM plot (.sam)

MIDAS data viewer

Statistical significant genes identification methods Two methods implemented in this release of MIDAS: Cross-slide replicates one-class T-test Cross-slide replicates one-class SAM

SAM (Significance Analysis of Microarrays) Tusher, V.G., R. Tibshirani and G. Chu Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences USA 98: A statistical technique for finding significant genes in a set of microarray experiments. Reference: Designs: two-class unpaired two-class paired multi-class unpaired censored survival one-class (available in this release)

SAM (Significance Analysis of Microarrays) One-class SAM: Identify genes whose mean expression across experiments are different from a user-specified mean. Assign a score (d) to each gene based on its change in expression relative to the standard deviation of repeated measurements for the gene Genes with scores > a threshold (Δ) are deemed potentially significant For these “deemed potentially significant” genes, the proportion of them likely to have been wrongly identified by chance, or False Discovery Rate (FDR) is estimated The goal is picking a set of differentially expressed genes with a user-satisfied FDR

SAM (Significance Analysis of Microarrays) Δ adjustment FDR positively significant genes

Automated report generation

TM4 MIDAS web page