Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics Tools for Microarray Analysis Connie Wu Dr. Jim Breaux Dr. Sandeep Gulati ViaLogy Southern California Bioinformatics Institute Summer 2004.

Similar presentations


Presentation on theme: "Bioinformatics Tools for Microarray Analysis Connie Wu Dr. Jim Breaux Dr. Sandeep Gulati ViaLogy Southern California Bioinformatics Institute Summer 2004."— Presentation transcript:

1 Bioinformatics Tools for Microarray Analysis Connie Wu Dr. Jim Breaux Dr. Sandeep Gulati ViaLogy Southern California Bioinformatics Institute Summer 2004 Funded by the National Science Foundation and National Institutes of Health

2 Company Overview Discovered and developed software implementation of Active Signal Processing (called Quantum Resonance Interferometry)Discovered and developed software implementation of Active Signal Processing (called Quantum Resonance Interferometry) Applying QRI to analysis of DNA microarrays enhances performance:Applying QRI to analysis of DNA microarrays enhances performance: Increased detection sensitivity and dynamic rangeIncreased detection sensitivity and dynamic range Increased specificityIncreased specificity Increased reproducibilityIncreased reproducibility

3 Company Overview VMAxS: web-based service for analyzing microarrays using QRI.VMAxS: web-based service for analyzing microarrays using QRI. VMAxS Microarray image Signal Values Cel Report Active Signal Processing Further Analysis in R Cel Report File Reader

4 Project 1: Development of a more efficient file reader VMAxS generates Cel Report with gene and feature-level signal for a single microarray. ~22000 genes ≤ 69 features per gene ≤ 7 statistical values for each gene and feature Cel Report

5 Project 1: Development of a more efficient file reader Read through the entire file in the shortest amount of time Store the data in R data structure for further analysis Extract the statistic of interest with all labels attached (i.e. gene names, gene feature names, etc.) Goals: R version Cel Report reader: average speed for one execution is over 30 sec.

6 Feature-level results: The Cel Report Header First gene Rest of the file

7 Cel Report Example FilenameProbeset ID Array_1/1007_s_at

8 Cel Report Example Values per gene Features per gene Gene Results

9 Things to consider… Reading a file when no header information is disclosed Reading a file as efficient as possible =“open, read, close” in one step Use more efficient language: C Interface C with R Transferring C data structure to R data structure

10 C Data Structure 1.Gene Feature ID 2.Gene Feature 3.Gene ID 4.Number of features per Gene 5.Gene Results R Data Structure 1.Feature Data 2.Number of Features 3.Gene Results

11 Output Feature Data Number of Features Gene Result

12 Corresponding Values from the Cel Report Feature Data Number of Features Gene Result

13 Advantages… All vectors in C are dynamically allocated. Both time and memory efficient: 1.File is only read once 2.Only appropriate amount of memory is allocated for each data set

14 Runtime Comparison 16 Cel Reports, each with ~22000 genes R Version C Version 9 min. 25 sec 28 sec 42 Cel Reports, each with ~22000 genes R VersionC Version 37min 57sec 1min 12sec

15 Project 2: Development of an automated comparative performance report Compare performance of ViaLogy’s analytical process to that of current standard approach (e.g., GCOS from Affymetrix) Write R script to automatically generate the following plots for performance report: 1. Sensitivity Bar Plots 2. CV Plots 3. ECDF Plots

16 Sensitivity Bar Plots Compares the Sensitivity of VMAxS to GCOS 1.Genes called Present in GCOS 2.Genes called Present in VMAxS 3.Genes called Present in GCOS, Absent in VMAxS 4.Genes called Present in VMAxS, Absent in GCOS

17

18 CV Plots Purpose: Compare reproducibility Displays scatter plots of CV values for each gene. CV i = std.dev / mean for replicate signal values for gene i For each group of replicates, plot CV i,GCOS vs. CV i,VMAxS

19

20 ECDF Plot Displays empirical cumulative distribution function (ECDF) of the CV values for each analytical method

21

22 Subgroup Analysis For a given set of replicates, break down the data into smaller groups and compare the reproducibility in smaller sets of data One way to break down: consider PRESENT/ABSENT calls Divide the genes into groups based on the number of PRESENT calls received for each analytical method, e.g.: 6 P in VMAxS, 0 P in GCOS 6 P in VMAxS, 1 P in GCOS 6 P in VMAxS, 2 P in GCOS … 0 P in VMAxS, 6 P in GCOS Total of 49 (7x7) groups for 6 replicates.

23 PCount Table Displays the total number of genes in each group

24 CV Plots

25 ECDF Plot

26 Acknowledgement Dr. Jim Breaux Dr. Sandeep Gulati The rest of Vialogy staff Professors and Staff members of SoCalBSI Fellow Interns, especially Lien Chung NSF & NIH


Download ppt "Bioinformatics Tools for Microarray Analysis Connie Wu Dr. Jim Breaux Dr. Sandeep Gulati ViaLogy Southern California Bioinformatics Institute Summer 2004."

Similar presentations


Ads by Google