Presentation is loading. Please wait.

Presentation is loading. Please wait.

The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.

Similar presentations


Presentation on theme: "The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis."— Presentation transcript:

1 The following slides have been adapted from http://www.tm4.org/http://www.tm4.org/ to be presented at the Follow-up course on Microarray Data Analysis (Nov 20-24 2006, PICB Shanghai) by Peter Serocka

2 MIcroarray Data Analysis System (version 2.19 ) Wei Liang October 2004

3 Microarray Data Flow Image Analysis Database AGED Database Others… Database MAD Raw Gene Expression Data Normalized Data with Gene Annotation Interpretation of Analysis Results.tiff Image File Gene Annotation ScannerPrinter Normalization / Filtering Expression Analysis Data Entry / Management

4 MIDAS is a Normalization and Filtering tool for microarray data analysis!

5 Serves as a data pre-processor for clustering analysis (MeV).

6 Why Normalization and Filtering? Cy3 Cy5 Cy5-cDNA Cy3-cDNA RT cDNA array Cy5 intensity Cy3 intensity Sample2 mRNA Sample1 mRNA Wavelength dependent Intensity dependent Uneven hybridization gel print-tip variations Background variations Image processing algorithm- dependent Systematic experimental error.tiff Image Files Raw Data File

7 Why Normalization and Filtering? We use these intensities to identify biologically relevant patterns of expression by comparing measured levels between states on a gene-by-gene basis. However, before the levels can be appropriately compared, one generally performs a number of transformations on the data to eliminate questionable or low quality data, to adjust the measured intensities to facilitate comparisons, and to select those genes that are significantly differentially expressed. The hypothesis underlying microarray analysis is that the measured intensities for each arrayed gene represent its relative expression level.

8 MIDAS data analysis methods 8 normalization/transformation methods Total Intensity normalization 10 quality control filtering methods Invalid-intensity checking LOWESS (Locfit) normalization Iterative linear regression normalization Iterative log mean centering normalization Ratio Statistics normalization Low intensity filter Standard deviation regularization Slice analysis (non-statistical) In-slide replicates analysis Flip-dye consistency checking Ratio Statistics confidence interval checking Signal/Noise checking Cross-file-trim Spot QC flag checking MA-ANOVA Cross-slide replicates t-test (statistical) Cross-slide one-class SAM (statistical) 3 significant genes identification methods

9 Graphical scripting language

10 Read input files Define analysis pipeline and set parameters for each analysis module Write output files

11 MIDAS data analysis methods 8 normalization/transformation methods Total Intensity normalization 10 quality control filtering methods Invalid-intensity checking LOWESS (Locfit) normalization Iterative linear regression normalization Iterative log mean centering normalization Ratio Statistics normalization Low intensity filter Standard deviation regularization Slice analysis (non-statistical) In-slide replicates analysis Flip-dye consistency checking Ratio Statistics confidence interval checking Signal/Noise checking Cross-file-trim Spot QC flag checking MA-ANOVA Cross-slide replicates t-test (statistical) Cross-slide one-class SAM (statistical) 3 significant genes identification methods

12 Sample data Pair #1 st file name2 nd file name 1NFE005d0001.mevNFE005d00020.mev 2NFE005d0002.mevNFE005d00021.mev 3NFE005d0003.mevNFE005d00022.mev 4NFE005d0004.mevNFE005d00023.mev 5NFE005d0005.mevNFE005d00024.mev 6NFE005d0006.mevNFE005d00025.mev 7NFE005d0007.mevNFE005d00026.mev 9NFE005d0008.mevNFE005d00027.mev 10NFE005d0009.mevNFE005d00028.mev 11NFE005d00010.mevNFE005d00029.mev 12NFE005d00011.mevNFE005d00030.mev 13NFE005d00012.mevNFE005d00031.mev 14NFE005d00013.mevNFE005d00032.mev 15NFE005d00014.mevNFE005d00033.mev 16NFE005d00015.mevNFE005d00034.mev 17NFE005d00016.mevNFE005d00035.mev 18NFE005d00017.mevNFE005d00036.mev 19NFE005d00018.mevNFE005d00037.mev 20NFE005d00019.mevNFE005d00038.mev

13 LOWESS (Locfit) normalization ASD = 0.346 Observations 1.Tilted tails at low intensity end and high intensity end 2. Mean not centered at 0 – intensity dependent R-I plot: logRatio vs. logIntensityProduct

14 LOWESS (Locfit) normalization ASD = 0.346 Gene X If Cy3, Cy5 equally expressed, log 2 (Cy5/Cy3) = 0 Two factors contributed to the up-regulated gene X: 1. Biological factors (we are interested) 2. Experimental factors, e.g. different sensitivity to red and green lasers (we are NOT interested and desire to get rid of.) Exp factor Bio factor

15 ASD = 0.346 Gene X Exp factor Bio factor We need to find a way to extract the experimental factors Approach: Assume similar experimental factors applied to genes closer to each other in the logProd-logRatio plot Predict the Exp factor from a group of locally neighboring data --- equivalent to a curve fitting problem. LOWESS (Locfit) normalization

16 Local linear regression model Tri-cube weight function Least Squares Estimated values of log 2 (Cy5/Cy3) as function of log 10 (Cy3*Cy5) ASD = 0.346

17 LOWESS (Locfit) normalization Use the estimated curve y(x i ) to correct raw data ASD = 0.346 Gene X y(x i ) = Exp factor Bio factor log 2 (R i ’/G i ’) = log 2 (R i /G i ) – y(x i ) log 2 (R i ’/G i ’) = log 2 (R i /G i ) – log 2 2 y(xi) log 2 (R i ’/G i ’) = log2(R i /G i * 1/2y(x i )) R i ’ = R i G i ’ = G i * 2 y(xi)

18 LOWESS (Locfit) normalization SD = 0.346 SD = 0.338 B LOWESS-corrected RI plot

19 Standard deviation regularization Assumption: Within each block and each slide, spots should have the same spread for log(Cy5/Cy3, 2) values SD-Reg scales the (Cy3, Cy5) intensity pair for each spot so that the spot sets within each block or each slide will have the same standard deviation as other blocks or slides.

20 Standard deviation regularization Let a ij be the raw log ratio for the j th spot in i th block (or slide) where N j denotes the number of genes i th block or i th slide, M denotes the number of blocks or slides, a ij denotes the log ratio mean of i th block (or i th slide) a’ ij be the scaled log ratio for the j th spot in i th block (or slide)

21 Standard deviation regularization

22 Flip dye replicates consistency filter The intensities in the file pair are flipped, i.e. R1/G1 ~ G2/R2 or R1~ G2, G1 ~ R2 G1R1 G2R2 Gene1 Gene2 Gene3 Gene4 Gene8 Gene7 Gene6 Gene5 Flip dye experiments help reduce random error

23 Flip dye replicates consistency filter Calculate expression levels for all genes in the flip-dye pair Filter genes with inconsistent expression levels between flip-dye replicates For those genes passed the consistency checking, take geometric mean for the corresponding intensities from the replicated pairs How consistency is measured between replicates?

24 Flip dye replicates consistency filter File 1 File 2 G1R1G2R2 Gene 100% consistency:

25 Flip dye replicates consistency Filter SD cut vs. Threshold cut SD cut Threshold cut Regardless of datasets, always cut the same percentage for the same  The percentage to cut depends on the specified log-ratio consistency range -1< < 1 1/2 < < 2

26 Flip dye replicates consistency filter Calculate expression levels for all genes in the flip-dye pair Filter genes with inconsistent expression levels between flip-dye replicates For those genes passed the consistency checking, take geometric mean for the corresponding intensities from the replicated pairs

27 Slice Analysis filter Remove genes with z-scores beyond an interested range

28 Slice Analysis filter Remove genes with z-scores beyond an interested range

29 Slice Analysis filter SD = 0.346 SD = 0.338 B Define a slice window Sliding the window along the log(IntensityProduct) axis Calculate logRatioMean and logRatioSD of data points within each slice window Calculate Z-scores of each data point Z-score = (logRatio-logRatioMean)/ logRatioSD Trim data with Z-scores beyond interested range

30 Slice Analysis filter

31 Analysis packaging myAnalysis.prj

32 MIDAS graphing

33 R-I plot (.prc) Box plot (.box) FlipDye Diagnostic plot (.rrc)Intensity plot (.ity,.lty) Z-score Distribution plot (.his)SAM plot (.sam)

34 MIDAS data viewer

35 Statistical significant genes identification methods Two methods implemented in this release of MIDAS: Cross-slide replicates one-class T-test Cross-slide replicates one-class SAM

36 SAM (Significance Analysis of Microarrays) Tusher, V.G., R. Tibshirani and G. Chu. 2001. Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences USA 98: 5116- 5121. A statistical technique for finding significant genes in a set of microarray experiments. Reference: Designs: two-class unpaired two-class paired multi-class unpaired censored survival one-class (available in this release)

37 SAM (Significance Analysis of Microarrays) One-class SAM: Identify genes whose mean expression across experiments are different from a user-specified mean. Assign a score (d) to each gene based on its change in expression relative to the standard deviation of repeated measurements for the gene Genes with scores > a threshold (Δ) are deemed potentially significant For these “deemed potentially significant” genes, the proportion of them likely to have been wrongly identified by chance, or False Discovery Rate (FDR) is estimated The goal is picking a set of differentially expressed genes with a user-satisfied FDR

38 SAM (Significance Analysis of Microarrays) Δ adjustment FDR positively significant genes

39 Automated report generation

40

41 TM4 MIDAS web page http://www.tigr.org/software/tm4/midas.html http://www.tm4.org/midas.html


Download ppt "The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis."

Similar presentations


Ads by Google