Bioinformatics Tools for Microarray Analysis Connie Wu Dr. Jim Breaux Dr. Sandeep Gulati ViaLogy Southern California Bioinformatics Institute Summer 2004.

Slides:



Advertisements
Similar presentations
Process Monitoring is only the first step in improving process efficiency.
Advertisements

Internship Site #1 BioDiscovery Lead mentor: Bruce Hoff, Ph.D. Students will use BioDiscovery’s software tools to analyze gene or protein microarray data.
Chapter 3 Loaders and Linkers
TRACK 2™ Version 5 The ultimate process management software.
A Review of Image Analysis Software for Spotted Microarrays Jess Mar Department of Mathematics University of Queensland CBiS Microarray/Chip.
Microarray Normalization
COMP 6703 Project A GUI for R Statistical Package. Student: Ye Luo (u ) Clients: Professor Susan Wilson and Dr Yvonne Pittelkow Supervisor: Dr Peter.
By Angela Brooks and David Chapman Mentor: Dr. Garry Larson Molecular Medicine, City Of Hope Southern California Bioinformatics Institute 2004.
Distinguishing Regulators of Biomolecular Pathways Mentor: Dr. Xiwei Wu City of Hope Sean Caonguyen SoCalBSI 8/21/08.
Development, Implementation and Testing of a DNA Microarray Test Suite Ehsanul Haque Mentors: Dr. Cecilie Boysen Dr. Jim Breaux ViaLogy Corp.
Prototyping. Horizontal Prototyping Description of Horizontal Prototyping A Horizontal, or User Interface, Prototype is a model of the outer shell of.
Troubleshooting.
Microarray Analysis with a Small Number of Replicates By Kung-Hua Chang & Dhondup Pemba By Kung-Hua Chang & Dhondup Pemba Mentors: Cecilie Boysen, Ph.D.
Pathway Analysis Michael Sneddon Southern California Bioinformatics Institute August 20, 2004.
Please have a seat. Our program will commence shortly.
Summer at ViaLogy Ronald J. Perez. ViaLogy Developers of computational products for increased performance of molecular detection systems ViaAmp Gene expression.
Feature Screening Concept: A greedy feature selection method. Rank features and discard those whose ranking criterions are below the threshold. Problem:
SIMULINK Dr. Samir Al-Amer. SIMULINK SIMULINK is a power simulation program that comes with MATLAB Used to simulate wide range of dynamical systems To.
Microarray Analysis Software at NIH. BRB ArrayTools Visualization and Statistical analysis of gene expression data Features –Excel Add-in –Flexible Data.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
High Throughput Data Analysis Karin Leiderman ViaLogy Southern California Bioinformatics Summer Institute at California State University, Los Angeles.
BWBmin Administrative Web Interface for Paracel BioView WorkBench Frances Tong Marc Rieffel, PhD Paracel Southern California Bioinformatics Summer Institute.
Results The following results are for a specific DUT device called Single Ring Micro Resonator: Figure 6 – PDL against Wavelength Plot Figure 7 – T max.
Research Presentation Nordborg Lab University of Southern California Helly Kwee August 21, 2003 Southern California Bioinformatics Summer Institute.
Automated Feeding Solution for Dog Owners Final Report December 7, 2007 Project Automated Dog Feeder Project Advisor Dr. Hongwei Wu The Canine Hunger Force.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 2006 Exterminator: Automatically Correcting Memory Errors Gene Novark, Emery Berger.
Tutorial Digital Imaging PC exercises with ImageJ Medical Image Processing Software quality Karl-Friedrich Kamm.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Office of Research and Development National Exposure Research Laboratory, Atmospheric Modeling Division, Applied Modeling Research Branch October 8, 2008.
Copyright 2000, Media Cybernetics, L.P. Array-Pro ® Analyzer Software.
Training Course 2 User Module Training Course 3 Data Administration Module Session 1 Orientation Session 2 User Interface Session 3 Database Administration.
Working With Large Datasets in Corporate Settings Ed Bassin
Data Processing Machine Learning Algorithm The data is processed by machine algorithms based on hidden Markov models and deep learning. They are then utilized.
Analysis of Molecular and Clinical Data at PolyomX Adrian Driga 1, Kathryn Graham 1, 2, Sambasivarao Damaraju 1, 2, Jennifer Listgarten 3, Russ Greiner.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University DCCFinder: A Very- Large Scale Code Clone Analysis.
Agenda Introduction to microarrays
Steps in simulation study 1. - Clearly understand problem - Reformulation of the problem 2. - Which questions should be answered? - Is simulation appropriate?
FotoGazmic Software (From left to right: Chad Zbinden, Josey Baker, Rob Mills, Myra Bergman, Tinate Dejtiranukul)
MTMM Wrap Up–1 Marketing Engineering: A Look Ahead.
Interactive tools and programming environments for sequence analysis Bernardo Barbiellini Northeastern University TATACATAAAGACCCAAATGGAACTGTTCTAGA TGATACACTAGCATTAAGAGAAAAATTCGAAGA.
Chapter 4 Storage Management (Memory Management).

Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Testing Methods Carl Smith National Certificate Year 2 – Unit 4.
TPD Workstation Software TPD software for control/ analysis of experiments Multiple temperature steps. Optimised data acquisition.
Implementation of a Digital Image Correlation Interface for the Mechanical Testing of Materials By: Nigel Ray Advisors: Professor Chasiotis Mohammed Naraghi.
May 9, Data Management for Health Sciences Research Now More than Ever-- SIR is the Best Option Howard Andrews Data Coordinating Center Columbia.
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University July 21, 2008WODA.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb GP3xCLI GenePix Post-Processing.
Effects of Visualization and Interface Design on User Comprehensibility of Composite Data Asheem Chhetri, Apoorv Wairagade, Mahesh Gorantla, Hanye Xu,
Survey of Tools to Support Safe Adaptation with Validation Alain Esteva-Ramirez School of Computing and Information Sciences Florida International University.
ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang.
Pathway Ranking Tool Dimitri Kosturos Linda Tsai SoCalBSI, 8/21/2003.
Abstract Dynamic Message Signs (DMS) on freeways are used to provide a variety of information to motorists including incident and construction information,
AN INTRODUCTION TO GENE EXPRESSION ANALYSIS BY MICROARRAY TECHNIQUE (PART II) DR. AYAT B. AL-GHAFARI MONDAY 10 TH OF MUHARAM 1436.
Enterprise Database Systems Introduction to SQL Server Dr. Georgia Garani Dr. Theodoros Mitakos Technological.
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
Microarray Data Analysis Roy Williams PhD; Burnham Institute for Medical Research.
1 Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics.
1 Building Web-base SIP Analyzer with Ajax Approach Yan-Hsiang Wang & Dr. Quincy Wu National Chi Nan University Graduate Institute of CSIE
Target Audience All QA members Objective: This presentation aims to deliver the following: Writing Test plans. Managing test cases repository as per.
Using Python to Retrieve Data from the CUAHSI HIS Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2015 This work was funded by National Science.
Bishnu Priya Nanda , Tata Consultancy Services Ltd.
Pipeline Execution Environment
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
“Under the Hood” of Polymorphism
Overview of Computer system
Presentation transcript:

Bioinformatics Tools for Microarray Analysis Connie Wu Dr. Jim Breaux Dr. Sandeep Gulati ViaLogy Southern California Bioinformatics Institute Summer 2004 Funded by the National Science Foundation and National Institutes of Health

Company Overview Discovered and developed software implementation of Active Signal Processing (called Quantum Resonance Interferometry)Discovered and developed software implementation of Active Signal Processing (called Quantum Resonance Interferometry) Applying QRI to analysis of DNA microarrays enhances performance:Applying QRI to analysis of DNA microarrays enhances performance: Increased detection sensitivity and dynamic rangeIncreased detection sensitivity and dynamic range Increased specificityIncreased specificity Increased reproducibilityIncreased reproducibility

Company Overview VMAxS: web-based service for analyzing microarrays using QRI.VMAxS: web-based service for analyzing microarrays using QRI. VMAxS Microarray image Signal Values Cel Report Active Signal Processing Further Analysis in R Cel Report File Reader

Project 1: Development of a more efficient file reader VMAxS generates Cel Report with gene and feature-level signal for a single microarray. ~22000 genes ≤ 69 features per gene ≤ 7 statistical values for each gene and feature Cel Report

Project 1: Development of a more efficient file reader Read through the entire file in the shortest amount of time Store the data in R data structure for further analysis Extract the statistic of interest with all labels attached (i.e. gene names, gene feature names, etc.) Goals: R version Cel Report reader: average speed for one execution is over 30 sec.

Feature-level results: The Cel Report Header First gene Rest of the file

Cel Report Example FilenameProbeset ID Array_1/1007_s_at

Cel Report Example Values per gene Features per gene Gene Results

Things to consider… Reading a file when no header information is disclosed Reading a file as efficient as possible =“open, read, close” in one step Use more efficient language: C Interface C with R Transferring C data structure to R data structure

C Data Structure 1.Gene Feature ID 2.Gene Feature 3.Gene ID 4.Number of features per Gene 5.Gene Results R Data Structure 1.Feature Data 2.Number of Features 3.Gene Results

Output Feature Data Number of Features Gene Result

Corresponding Values from the Cel Report Feature Data Number of Features Gene Result

Advantages… All vectors in C are dynamically allocated. Both time and memory efficient: 1.File is only read once 2.Only appropriate amount of memory is allocated for each data set

Runtime Comparison 16 Cel Reports, each with ~22000 genes R Version C Version 9 min. 25 sec 28 sec 42 Cel Reports, each with ~22000 genes R VersionC Version 37min 57sec 1min 12sec

Project 2: Development of an automated comparative performance report Compare performance of ViaLogy’s analytical process to that of current standard approach (e.g., GCOS from Affymetrix) Write R script to automatically generate the following plots for performance report: 1. Sensitivity Bar Plots 2. CV Plots 3. ECDF Plots

Sensitivity Bar Plots Compares the Sensitivity of VMAxS to GCOS 1.Genes called Present in GCOS 2.Genes called Present in VMAxS 3.Genes called Present in GCOS, Absent in VMAxS 4.Genes called Present in VMAxS, Absent in GCOS

CV Plots Purpose: Compare reproducibility Displays scatter plots of CV values for each gene. CV i = std.dev / mean for replicate signal values for gene i For each group of replicates, plot CV i,GCOS vs. CV i,VMAxS

ECDF Plot Displays empirical cumulative distribution function (ECDF) of the CV values for each analytical method

Subgroup Analysis For a given set of replicates, break down the data into smaller groups and compare the reproducibility in smaller sets of data One way to break down: consider PRESENT/ABSENT calls Divide the genes into groups based on the number of PRESENT calls received for each analytical method, e.g.: 6 P in VMAxS, 0 P in GCOS 6 P in VMAxS, 1 P in GCOS 6 P in VMAxS, 2 P in GCOS … 0 P in VMAxS, 6 P in GCOS Total of 49 (7x7) groups for 6 replicates.

PCount Table Displays the total number of genes in each group

CV Plots

ECDF Plot

Acknowledgement Dr. Jim Breaux Dr. Sandeep Gulati The rest of Vialogy staff Professors and Staff members of SoCalBSI Fellow Interns, especially Lien Chung NSF & NIH