Presentation is loading. Please wait.

Presentation is loading. Please wait.

ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

Similar presentations


Presentation on theme: "ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”"— Presentation transcript:

1 ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ” Funded by the National Science Foundation and National Institute of Health

2 Outline of Talk Background  Affymetrix GeneChips  Vialogy and Microarray Analysis Accelerating Low Level Analysis Algorithms  Quantile Normalization  Median Polish Differential Expression Toolkit  Statistical Analysis of Microarrays (SAM) Future Direction

3 Affymetrix GeneChip ® Microarrays Useful tool to measure the level of mRNA expression of thousands of genes in a biological sample  Signal detection Convert fluorescence to signal  Normalization Reduce unwanted variation across chips  Summarization Reduce 11- 20 probe intensities of each gene to a single value Low Level Analysis

4 Internet Resources An open source and open software project for the analysis and comprehension of genomic data A collection of analysis packages implemented in the R language Packages used: affy, siggenes BioConductor R Project Open source language and environment for statistical computing and graphics Pros: built in mathematical functions, supports graphics Cons: computationally slow

5 ViaLogy’s Low Level Analysis (Part 1) VMAxS Microarray image Pixel intensity CEL Report Feature level signal Signal Detection via “Active Signal Processing”

6 CEL Report NORMALIZATION (Quantile Normalization) SUMMARIZATION (Median Polish) Project 1: Recode RMA as a C interface from R  Specific to Vialogy’s input files  Introduce a way to deal with zero values  Break up process into individual functions ViaLogy’s Low Level Analysis (Part 2) Robust Multi-Chip Analysis (RMA)  Written in R and C language (affy package)  Only specific to Affymetrix input files  Do not have special ways of dealing with zero values Irizarry, R. et al (2003) Slow Run Time in R language

7 Quantile Normalization Significant variation in the distribution of intensity values across arrays Transforms the distribution of probe intensities to be same across arrays Final distribution is the average of each quantile across chips Bolstad et al. (2003) Density Log Intensities

8 Quantile Normalization cont’d Sort each column of original matrix Take average across rows Set each value to corresponding row average Unsort columns of matrix to original order Bolstad et al. (2003)

9 Median Polish Summarization step used in RMA Fits a linear model to the data for each probe set across all microarrays Greatly reduces variability for genes expressed at lower levels Tukey, J. (1977) Irizarry, R. (2003) 11-20 features per gene 1 expression value per gene

10 Quantile Normalization and Median Polish in C  Read literature on Quantile Normalization and Median Polish  Use R and C code as foundation for my code  Add functionalities to deal with ties and zeroes  Testing of code for accuracy of algorithm Steps Involved... Results... QUANTILE NORMALIZATION 11 min 53 secs For ~ 20,000 genes, 30 Arrays MEDIAN POLISH 4 min 43 secs 10 secs 20 secs R code C code

11 CEL file NORMALIZATION (Quantile Normalization) SUMMARIZATION (Median Polish) Differential Expression Toolkit Project 2 : To Recap...

12 Statistical Analysis of Microarrays (SAM) Calculate a statistic (d-score) for each gene. Order the d-scores. Create B sets of random permutations of group labels. For each permutation calculate d-scores for all genes and order them. From the B set of ordered statistics, find expected order statistics. Plot observed d-scores v. expected d-scores and evaluate significant genes based on user-defined threshold ( Δ) Tusher et al. (2001)

13 SAM Example Group 1Group 2 123456 Gene 11.10.30.42.11.61.3 Gene 20.11.20.51.5-0.3 Gene 30.7-0.21.3-0.3-0.51.5 Gene 4-0.91.40.6-0.61.01.3 Gene 51.50.81.0-0.70.3-0.8 ordered d-score -1.5 0.3-0.2 0.40.3 -0.20.4 1.6 Observed d-scores

14 SAM Example (cont’d) Permutation # i Group 1Group 2ordered 524163d-score Gene 11.10.30.42.11.61.30.3-1.2 Gene 20.11.20.51.5-0.3 0.9-0.2 Gene 30.7-0.21.3-0.3-0.51.5-0.20.3 Gene 4-0.91.40.6-0.61.01.30.5 Gene 51.50.81.0-0.70.3-0.8-1.20.9 Permutation #1Permutation #2…Permutation #BAvg d-scores -1.2-0.6-0.20.5 Ordered-0.2-0.3-0.10.8 d-scores0.30.11.01.3 0.50.21.2 0.91.61.30.6 Expected d-scores

15 SAM Example (cont’d)

16 SAM Implementation Siggenes (BioConductor)  R language (slow)  Too many options C interface from R  Faster run time  Specific to Vialogy’s input files and functionalities  Read SAM literature and understand algorithm  Go through Siggenes source code  Write C code, taking out unnecessary steps and adding additional functionalities For data set of ~ 7000 genes, 8 Arrays SAM in R C interface from R ~60 seconds~5 seconds

17 Input to SAM

18 Results in R

19 Future Direction 1. SAM Implementation for other study types such as “paired” and “one-class” Procedures for dealing with zeros 2. Differential Expression Toolkit Evaluate other more accurate and efficient methods

20 References Journals Irizarry, R. et al. (2003) “Exploration, normalization, and summaries of high density oligonucleotide array probe level data,” Biostatistics. Bolstad, (2003). “A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Bias and Variance” Bioinformatics Tukey, John. (1977) “Exploratory Data Analysis”. Tusher et al. (2001). “Significance analysis of microarrays applied to ionizing radiation response,” PNAS. Websites www.bioconductor.org www.r-project.org www-stat.stanford.edu/~tibs/SAM/

21 Acknowledgements SoCalBSI Members Prof. Jamil Momand Prof. Sandra Sharp Prof. Wendie Johnston Prof. Nancy Warter-Perez Jacqueline Heras Fellow Interns  Jim Breaux, Ph.D.  Sandeep Gulati, Ph.D.  Robin Hill  Juan Guitterez  Vijay Daggumati  Other Employees National Science Foundation & National Institute of Health

22

23 Median Polish Cont’d and so on…until sum of the “residuals” of the matrix is small The probeset summary for each gene is computed by taking into account the row effect and column effect that is determined by Median Polish Tukey, J. (1977)


Download ppt "ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”"

Similar presentations


Ads by Google