ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

Slides:



Advertisements
Similar presentations
NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.
Advertisements

27/06/2005ISMB 2005 GenXHC: A Probabilistic Generative Model for Cross- hybridization Compensation in High-density Genome-wide Microarray Data Joint work.
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.
Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.
Microarray Normalization
Zhongxue Chen, Monnie McGee, Qingzhong Liu and Richard Scheuermann
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Development, Implementation and Testing of a DNA Microarray Test Suite Ehsanul Haque Mentors: Dr. Cecilie Boysen Dr. Jim Breaux ViaLogy Corp.
Statistical Methods in Microarray Data Analysis Mark Reimers, Genomics and Bioinformatics, Karolinska Institute.
Getting the numbers comparable
Probe Level Analysis of AffymetrixTM Data
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Microarray Analysis with a Small Number of Replicates By Kung-Hua Chang & Dhondup Pemba By Kung-Hua Chang & Dhondup Pemba Mentors: Cecilie Boysen, Ph.D.
Preprocessing Methods for Two-Color Microarray Data
Microarray Data Preprocessing and Clustering Analysis
Low-Level Analysis and QC Regional Biases Mark Reimers, NCI.
Differentially expressed genes
Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew.
1 Preprocessing for Affymetrix GeneChip Data 1/18/2011 Copyright © 2011 Dan Nettleton.
SNP chips Advanced Microarray Analysis Mark Reimers, Dept Biostatistics, VCU, Fall 2008.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
1 Models and methods for summarizing GeneChip probe set data.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Bioinformatics Tools for Microarray Analysis Connie Wu Dr. Jim Breaux Dr. Sandeep Gulati ViaLogy Southern California Bioinformatics Institute Summer 2004.
Microarray Analysis Jesse Mecham CS 601R. Microarray Analysis It all comes down to Experimental Design Experimental Design Preprocessing Preprocessing.
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
BWBmin Administrative Web Interface for Paracel BioView WorkBench Frances Tong Marc Rieffel, PhD Paracel Southern California Bioinformatics Summer Institute.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
Summaries of Affymetrix GeneChip probe level data By Rafael A. Irizarry PH 296 Project, Fall 2003 Group: Kelly Moore, Amanda Shieh, Xin Zhao.
Microarray Preprocessing
Lecture 10. Microarray and RNA-seq
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Agenda Introduction to microarrays
Microarray - Leukemia vs. normal GeneChip System.
Lo w -Level Analysis of Affymetrix Data Mark Reimers National Cancer Institute Bethesda Maryland.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Microarray Data Pre-Processing
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Paper Review on Cross- species Microarray Comparison Hong Lu
CGH Data BIOS Chromosome Re-arrangements.
Statistical Analyses of High Density Oligonucleotide Arrays Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Pathway Ranking Tool Dimitri Kosturos Linda Tsai SoCalBSI, 8/21/2003.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
基于 R/Bioconductor 进行生物芯片数据分析 曹宗富 博奥生物有限公司
Using ArrayStar with a public dataset
Introduction to Affymetrix GeneChip data
CDNA-Project cDNA project Julia Brettschneider (UCB Statistics)
Significance Analysis of Microarrays (SAM)
Microarrays 1/31/2018.
Significance Analysis of Microarrays (SAM)
Getting the numbers comparable
Pan Du, Simon Lin Robert H. Lurie Comprehensive Cancer Center
Pre-processing AFFY data
Presentation transcript:

ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ” Funded by the National Science Foundation and National Institute of Health

Outline of Talk Background  Affymetrix GeneChips  Vialogy and Microarray Analysis Accelerating Low Level Analysis Algorithms  Quantile Normalization  Median Polish Differential Expression Toolkit  Statistical Analysis of Microarrays (SAM) Future Direction

Affymetrix GeneChip ® Microarrays Useful tool to measure the level of mRNA expression of thousands of genes in a biological sample  Signal detection Convert fluorescence to signal  Normalization Reduce unwanted variation across chips  Summarization Reduce probe intensities of each gene to a single value Low Level Analysis

Internet Resources An open source and open software project for the analysis and comprehension of genomic data A collection of analysis packages implemented in the R language Packages used: affy, siggenes BioConductor R Project Open source language and environment for statistical computing and graphics Pros: built in mathematical functions, supports graphics Cons: computationally slow

ViaLogy’s Low Level Analysis (Part 1) VMAxS Microarray image Pixel intensity CEL Report Feature level signal Signal Detection via “Active Signal Processing”

CEL Report NORMALIZATION (Quantile Normalization) SUMMARIZATION (Median Polish) Project 1: Recode RMA as a C interface from R  Specific to Vialogy’s input files  Introduce a way to deal with zero values  Break up process into individual functions ViaLogy’s Low Level Analysis (Part 2) Robust Multi-Chip Analysis (RMA)  Written in R and C language (affy package)  Only specific to Affymetrix input files  Do not have special ways of dealing with zero values Irizarry, R. et al (2003) Slow Run Time in R language

Quantile Normalization Significant variation in the distribution of intensity values across arrays Transforms the distribution of probe intensities to be same across arrays Final distribution is the average of each quantile across chips Bolstad et al. (2003) Density Log Intensities

Quantile Normalization cont’d Sort each column of original matrix Take average across rows Set each value to corresponding row average Unsort columns of matrix to original order Bolstad et al. (2003)

Median Polish Summarization step used in RMA Fits a linear model to the data for each probe set across all microarrays Greatly reduces variability for genes expressed at lower levels Tukey, J. (1977) Irizarry, R. (2003) features per gene 1 expression value per gene

Quantile Normalization and Median Polish in C  Read literature on Quantile Normalization and Median Polish  Use R and C code as foundation for my code  Add functionalities to deal with ties and zeroes  Testing of code for accuracy of algorithm Steps Involved... Results... QUANTILE NORMALIZATION 11 min 53 secs For ~ 20,000 genes, 30 Arrays MEDIAN POLISH 4 min 43 secs 10 secs 20 secs R code C code

CEL file NORMALIZATION (Quantile Normalization) SUMMARIZATION (Median Polish) Differential Expression Toolkit Project 2 : To Recap...

Statistical Analysis of Microarrays (SAM) Calculate a statistic (d-score) for each gene. Order the d-scores. Create B sets of random permutations of group labels. For each permutation calculate d-scores for all genes and order them. From the B set of ordered statistics, find expected order statistics. Plot observed d-scores v. expected d-scores and evaluate significant genes based on user-defined threshold ( Δ) Tusher et al. (2001)

SAM Example Group 1Group Gene Gene Gene Gene Gene ordered d-score Observed d-scores

SAM Example (cont’d) Permutation # i Group 1Group 2ordered d-score Gene Gene Gene Gene Gene Permutation #1Permutation #2…Permutation #BAvg d-scores Ordered d-scores Expected d-scores

SAM Example (cont’d)

SAM Implementation Siggenes (BioConductor)  R language (slow)  Too many options C interface from R  Faster run time  Specific to Vialogy’s input files and functionalities  Read SAM literature and understand algorithm  Go through Siggenes source code  Write C code, taking out unnecessary steps and adding additional functionalities For data set of ~ 7000 genes, 8 Arrays SAM in R C interface from R ~60 seconds~5 seconds

Input to SAM

Results in R

Future Direction 1. SAM Implementation for other study types such as “paired” and “one-class” Procedures for dealing with zeros 2. Differential Expression Toolkit Evaluate other more accurate and efficient methods

References Journals Irizarry, R. et al. (2003) “Exploration, normalization, and summaries of high density oligonucleotide array probe level data,” Biostatistics. Bolstad, (2003). “A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Bias and Variance” Bioinformatics Tukey, John. (1977) “Exploratory Data Analysis”. Tusher et al. (2001). “Significance analysis of microarrays applied to ionizing radiation response,” PNAS. Websites www-stat.stanford.edu/~tibs/SAM/

Acknowledgements SoCalBSI Members Prof. Jamil Momand Prof. Sandra Sharp Prof. Wendie Johnston Prof. Nancy Warter-Perez Jacqueline Heras Fellow Interns  Jim Breaux, Ph.D.  Sandeep Gulati, Ph.D.  Robin Hill  Juan Guitterez  Vijay Daggumati  Other Employees National Science Foundation & National Institute of Health

Median Polish Cont’d and so on…until sum of the “residuals” of the matrix is small The probeset summary for each gene is computed by taking into account the row effect and column effect that is determined by Median Polish Tukey, J. (1977)