Presentation is loading. Please wait.

Presentation is loading. Please wait.

Microarray Analysis with a Small Number of Replicates By Kung-Hua Chang & Dhondup Pemba By Kung-Hua Chang & Dhondup Pemba Mentors: Cecilie Boysen, Ph.D.

Similar presentations


Presentation on theme: "Microarray Analysis with a Small Number of Replicates By Kung-Hua Chang & Dhondup Pemba By Kung-Hua Chang & Dhondup Pemba Mentors: Cecilie Boysen, Ph.D."— Presentation transcript:

1 Microarray Analysis with a Small Number of Replicates By Kung-Hua Chang & Dhondup Pemba By Kung-Hua Chang & Dhondup Pemba Mentors: Cecilie Boysen, Ph.D & Jim Breaux, Ph.D Southern California Bioinformatics Institute Summer 2005 Funded By NSF/NIH

2 Outline Our Task Statistical Analysis with a Small Number of ReplicatesStatistical Analysis with a Small Number of Replicates Functional AnalysisFunctional Analysis Additional ProjectsAdditional ProjectsBackground Affymetrix GeneChip ® Microarrays Affymetrix GeneChip ® Microarrays VMAxS VMAxS Steps in Microarray Data Analysis Steps in Microarray Data Analysis

3 Affymetrix GeneChip ® Microarrays Affymetrix GeneChip ® Microarrays FOR MORE INFO... http://www.affymetrix.com 22 Probes define one gene Signal detection. Signal detection. Fluorescence detection of hybridization between RNA target and oligonucleotide probe.

4 Each gene on an Affy chip is represented by a probe set FOR MORE INFO... “Processing Affy chip Data: GCOS/MAS 5.0, RMA, and gcRMA”Roger Bumgarner “Processing Affy chip Data: GCOS/MAS 5.0, RMA, and gcRMA”(Roger Bumgarner University of Washington University of Washington). Perfect Match (PM) probe represents short segment of gene of interest. Perfect Match (PM) probe represents short segment of gene of interest. Mismatch (MM) probe measures background signal Mismatch (MM) probe measures background signal Data for probe set is summarized into single number (“gene-level” data) Data for probe set is summarized into single number (“gene-level” data)

5  ViaLogy ’ s data analysis service for DNA microarray chip data  Employs Quantum Resonance Interferometry technology to detect signals below background noise FOR MORE INFO... Visit Vialogy.com. Raw Data

6 Steps in Microarray Data Analysis Raw Data Image Image Analysis (extract cell-level data) VMA x S Gene-level summarization Normalization (remove non-biological variation)Statistical Analysis (select differentially expressed genes) Functional Analysis (identify affected processes and pathways)

7 Statistical Analysis with a Small Number of Replicates  Overall objective: Perform end-to-end analysis on a client’s microarray data set (from raw image to pathway analysis)  Problem: Dataset contained a small number of replicates Overview

8 Problem with small number of replicates Small number of replicates yields unreliable identification of gene variances FOR MORE INFO... Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays (Nitin et al.) With seven replicates, we are more confident that gene 1 is upregulated

9 Approach to dealing with a small number of replicates  Analyze a larger data set that has a good number of replicates (n = 8x8). –Assume this is the “truth”  Analyze a randomly selected subset of this data set (n = 3x3) using three different algorithms.  Compare output from 8x8 analysis to 3x3 analysis. –Decide how to analyze client’s data set based on results

10 Statistical Analysis Algorithms  SAM: Significance Analysis of Microarray (Tusher, Tibshirani & Chu)  J-Score (Jim Breaux)  Cyber-T (Baldi & Long)

11 SAM  Each gene receives a score based on the difference in average gene expression relative to the standard deviation of the repeated measurements.  Genes with scores greater than a threshold are considered significant.  This threshold is determined by the false discovery rate the user desires. FOR MORE INFO... Significance analysis of microarrays applied to the ionizing radiation response(Tusher et al)

12 J-Score  Each gene receives a score based on average fold-change in gene expression relative to the standard deviation of the repeated measurements.  Cut-off for selection of “significant” genes is arbitrary.

13 Cyber-T (Baldi & Long) Cyber-T ‘Regularized t-test’  “Assumes genes of similar expression levels have similar measurement errors.  The variance of any single gene can be estimated from the variance from a number of genes of similar expression level.  The variance of any gene within any given treatment can be estimated by the weighted average of a prior estimate of variance for that gene.” FOR MORE INFO... Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical framework (Long et al).

14  At 1% False Discovery Rate (FDR) SAM 8x8 picked up 762 significant genes (estimated number of false significant genes = 8).  Agreement between SAM 8x8 and the top 1000 genes from the 3x3 methods: Results: Comparison between SAM 8x8 and 3x3 methods

15  Venn Diagram: Results: Comparison between 3x3 methods Union of all three methods = 433 unique genes

16  Agreement between any two methods:  These findings are consistent with a previous study by a group at NIH :  These findings are consistent with a previous study by a group at NIH (Hosack et al.): –Found that agreement between various methods tested ranged from 7% to 60%. Results: Comparison between 3x3 methods

17 Possible Approaches for Final Analysis  Method 1: Final set of significant genes is derived from the method that had the most overlap with SAM 8x8 (J-Score).  Final result: –1000 total significant genes –At most 356 true positives –At most 652 false positives  Pro: –Decent number of true positives  Con: –Large number of false positives –Might be missing important genes found by other two methods

18 Possible Approaches for Final Analysis  Method 2: Final set of significant genes is the intersection of the three methods.  Final result: –174 total significant genes –At most 174 true positives –At most 8 false positives  Pro: –Lowest number of false positives  Con: –Lowest number of true positives

19 Possible Approaches for Final Analysis  Method 3: Final set of significant genes is the union of the three methods  Final result: –1631 total significant genes –At most 433 True positives –At most 1206 False positives  Pro: –Highest number of true positives.  Con: –Highest number of false positives

20 Final Approach  Return the largest number of true positives to the client (Method 3).  To deal with large number of potential false positives in the results, we rank each gene based on the ranking from Cyber-T, J-Score, and SAM methods. –For example, if “Gene 02” is ranked number 2 in Cyber- T, number 3 in J-Score, and number 4 in SAM, then the overall ranking is (2 + 3 + 4) / 3 = 3 –Higher ranking = more likely to be true positive

21 Example Output of Our Approach

22 Functional Analysis FOR MORE INFO... http://apps1.niaid.nih.gov/david/http://www.ariadnegenomics.com/products/pathway.html Mapping to biological processes. Mapping to biological processes. - EASE, the Expression Analysis Systematic Explorer from the National Institute of Allergy and Infectious Diseases at the National Institute of Health. Mapping to pathways. Mapping to pathways. - PathwayAssist software from Ariadne Genomics.

23 Mapping to biological processes The list of up and down regulated genes were inserted into EASE. The list of up and down regulated genes were inserted into EASE. The Lower the EASE score the more highly the ranked process is. The Lower the EASE score the more highly the ranked process is. Example of the top 14 processes, locations and functions found from our significant genes. Example of the top 14 processes, locations and functions found from our significant genes.

24 Mapping to pathways Gene 1, 2 and 3 are significant up- or down- regulated genes by our combination methodGene 1, 2 and 3 are significant up- or down- regulated genes by our combination method Investigation of gene 1 reveals gene 2 and 3 are involved in gene 1’s pathway.Investigation of gene 1 reveals gene 2 and 3 are involved in gene 1’s pathway. Gene 2 Gene 1 Gene 3

25 Conclusion  Three algorithms for selecting differentially expressed genes produced different lists of genes with ~60% to 70% agreement.  Taking the union of the results from the three algorithms yielded the most true positives for our client.  Biological processes and pathways found through functional analysis correspond to what we expected based on samples studied. –Helps to make microarray results more believable.

26 Additional Projects: Chris’s GUI  Automation of the previously discussed analyses with a GUI.

27 Chris’ GUI project

28 Chris’ GUI project screen 2

29 Additional Projects: Dhonam’s GUI  ViaLogy has individual scripts that are used to test quality of VMAxS output.  Current implementation requires working knowledge of R scripting.  Project: implement a user-friendly GUI program to execute multiple QC tests.

30 Dhonam’s GUI Project Screen 1

31 Dhonam’s GUI Screen 2

32 Dhonam’s GUI Screen 3 Optional window pops up if default parameters are not desired

33 Acknowledgements  Dr. Sandra Sharp  Dr. Wendie Johnston  Dr. Jamil Momand  Dr. Nancy Warter-Perez  Other SoCalBSI Staff and Faculty  SoCalBSI 2005 Participants  Lien Chung (SoCalBSI Participant 2004)  Dr. Cecilie Boysen  Dr. Jim Breaux  Other ViaLogy Employees SoCalBSIViaLogy

34 References  Hosack DA, Dennis GJ, Sherman BT, Lane HC, Lempicki RA: Identifying biological themes within lists of genes with EASE.Gen ome Biol 2003, 4:R70.  Leslie M. Cope, Irizarry RA, Jaffee HA, Wu J, Speed, TP. A benchmark for Affymetrix GeneChip expression measures. Bioinformatics 2004;20:323–331  Long, A.D., Mangalam, H.J., Chann, B.Y.P., Tolleri, L., Hatfield, G.W., and Baldi, P. (2001) Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical framework. The Journal of Biological Chemistry 276(23):19937-19944.  Nitin Jain, Jayant Thatte, Thomas Braciale, Klaus Ley, Michael O'Connell, Jae K. Lee: Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays. Bioinformatics 19(15): 1945- 1951 (2003)  Processing Affy chip Data: GCOS/MAS 5.0, RMA, and gcRMA (Roger Bumgarner )  Saviozzi S, Calogero RA. 2003. Microarray probe expression measures,. data normalization and statistical validation. Comparative and Functional Genomics Comp Funct Genom 2003; 4: 442– 446.Conference review  Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response, PNAS, 98, 5116-5121  http://www.tau.ac.il/lifesci/bioinfo/teaching/2002-2003/Differential_Genes_Dec03.ppt  http://www.kochi-u.ac.jp/~tatataa/RA/RA-targets.html  http://www.biostat.jhsph.edu/~ririzarr/Teaching/688/04-preproc-norm.pdf/  http://nibn.bgu.ac.il/core_units/microarray_facility/microarray_technique.htm  http://www.Vialogy.com


Download ppt "Microarray Analysis with a Small Number of Replicates By Kung-Hua Chang & Dhondup Pemba By Kung-Hua Chang & Dhondup Pemba Mentors: Cecilie Boysen, Ph.D."

Similar presentations


Ads by Google