Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.

Slides:



Advertisements
Similar presentations
Introduction to Microarray Gene Expression
Advertisements

Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG) Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG) Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center.
Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.
Microarray Normalization
Zhongxue Chen, Monnie McGee, Qingzhong Liu and Richard Scheuermann
Microarray Simultaneously determining the abundance of multiple(100s-10,000s) transcripts.
Development, Implementation and Testing of a DNA Microarray Test Suite Ehsanul Haque Mentors: Dr. Cecilie Boysen Dr. Jim Breaux ViaLogy Corp.
Statistical Methods in Microarray Data Analysis Mark Reimers, Genomics and Bioinformatics, Karolinska Institute.
Getting the numbers comparable
DNA microarray and array data analysis
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Microarray analysis Golan Yona ( original version by David Lin )
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
DNA Arrays …DNA systematically arrayed at high density, –virtual genomes for expression studies, RNA hybridization to DNA for expression studies, –comparative.
5 µm Millions of copies of a specific oligonucleotide probe >5 760,000 different complementary probes ~ targets Single stranded, labeled ‘target’
Information Aspects of Nucleic Acids Measurement Technologies Description of nucleic acid measurement technologies Algorithmic, optimization, data analysis.
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
1 Models and methods for summarizing GeneChip probe set data.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Introduce to Microarray
Affymetrix GeneChip Data Analysis Chip concepts and array design Improving intensity estimation from probe pairs level Clustering Motif discovering and.
Introduction to DNA microarrays DTU - January Hanne Jarmer.
Gene Expression Data Analyses (1) Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Microarrays: Basic Principle AGCCTAGCCT ACCGAACCGA GCGGAGCGGA CCGGACCGGA TCGGATCGGA Probe Targets Highly parallel molecular search and sort process based.
Summaries of Affymetrix GeneChip probe level data By Rafael A. Irizarry PH 296 Project, Fall 2003 Group: Kelly Moore, Amanda Shieh, Xin Zhao.
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics
Lecture 22 Introduction to Microarray
CDNA Microarrays MB206.
Data Type 1: Microarrays
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Agenda Introduction to microarrays
Assessing expression data quality in high-density oligonucliotide arrays.
Microarray - Leukemia vs. normal GeneChip System.
Scenario 6 Distinguishing different types of leukemia to target treatment.
Introduction to DNA microarray technologies Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor short course Summer 2002.
Lo w -Level Analysis of Affymetrix Data Mark Reimers National Cancer Institute Bethesda Maryland.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits)
Gene Expression and Evolution. Why are Evolutionists Interested in Gene Expression? Divergence in gene expression can underlie differences between taxa.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
GeneChip® Probe Arrays
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Disease Diagnosis by DNAC MEC seminar 25 May 04. DNA chip Blood Biopsy Sample rRNA/mRNA/ tRNA RNA RNA with cDNA Hybridization Mixture of cell-lines Reference.
Statistical Analyses of High Density Oligonucleotide Arrays Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Introduction to Oligonucleotide Microarray Technology
Other uses of DNA microarrays
Introduction to Affymetrix GeneChip data
Gene Expression Analysis
Copy-number estimation using Robust Multichip Analysis - Supplementary materials for the aroma.affymetrix lab session Henrik Bengtsson & Terry Speed Dept.
Functional Genomics in Evolutionary Research
Microarray Technology and Applications
Getting the numbers comparable
Presentation transcript:

Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment

Outline Description of high-density oligonucleotide expression array data Derivation of a model for gene expression estimation Application of the model for data quality assessment

Gene Expression Analysis Central Dogma: DNA -> mRNA -> Protein By comparing the abundance of mRNA in different cells we can deduce the genes associated with cell condition. Oligonucleotide arrays enable quantitative, highly parallel measurements of gene expression.

Probe Selection Probes are 25-mer selected from target sequence. 5-20K target fragments are interrogated by probe sets of probes.

Data preparation RNA samples are prepared, labeled, and hybridized with arrays. Arrays are scanned and the resulting image analyzed to produce an intensity value for each probe cell indicating how much hybridization occurred. Of interest is to find a way to combine probe intensities for a given gene to produce an index of expression – an indicator of mRNA abundance.

Oligonucleotide Arrays 18µm copies of a specific oligonucleotide probe per feature Image of Hybridized Probe Array Image of Hybridized Probe Array >450,000 different probes Single stranded, labeled RNA target Oligonucleotide probe * * * * *1.28cm GeneChip Probe Array Hybridized Probe Cell Compliments of D. Gerhold

Outline Description of high-density oligonucleotide expression array data Derivation of a model for gene expression analysis Application of the model for data quality assessment

Probe Intensity vs conc ex 1

The probe intensity model On a probe set by probe set basis, the log of the probe intensities, Y jk say, are modelled as the sum of a probe effect and a chip effect: Y jk =  j +  k +  jk To make this model identifiable, we constrain the sum of the probe effects to be zero. The  j ‘s can be interpreted as a relative non- specific binding effects for probes. The parameters  k provide an index of expression for each chip.

Example - detecting differential expression Fit the model to 24 chips with common source of RNA + 12 RNA spiked in at 2-fold pM concentrations between the two groups of 12.

MVA A vs B

Index vs Conc

Robust procedures Robust procedures perform well under a range of possible models and greatly facilitates the detection of anomalous data points. Why robust? Image artifacts Bad probes Bad chips Quality assessment

Robust fit example A

Robust fit example B

Residuals from fit

Outline Description of high-density oligonucleotide expression array data Derivation of a model for gene expression estimation Application of the model to data quality assessment

Chip manufacturer QA protocols Starting RNA QA – look at gel patterns and RNA quantification. Post hybridization QA – image examination, chip intensity parameters, expressions for control genes of various sorts, house keeping genes, percent present calls.

Goal: measuring expression data quality Manufacturer QA guidelines emphasize maintenance of data comparability across chips in analysis set. We seek assessments that measure data quality as it pertains to expression values. In particular, would like to provide quantitative measures that can help making decisions – Accept, Reject or Adjust.

Model components – role in QA Probe effects -can only be compared across fitting sets. -Chip effects - expression indices -can examine distribution of relative expressions across arrays. Residuals – more than 200K per chip. -view as chip image, summarize spatial patterns. -summarize in batches by chip. -combine to estimate SE of expression indices and these pooled and summarized by chip.

Robust fit by IRLS for each probe set Starting with robust fit, at each iteration: S = mad(r jk ) – robust estimate of scale or  u jk = r jk /S – standardized residuals w jk =  (|u jk |) – weights to reduce the effect of deviant points on next fit The SE of the final expression index is given by SE(a k ) = S/  (  j w jk ) Unscaled SE(a k ) = 1/  (  j w jk )

 function

Images of weights For 24 chips from Affymetrix, look at patterns of weights on chip real estate.

Images of weights

Images sign of residuals For 24 chips from Affymetrix, look at patterns of sign of residuals on chip real estate.

Images of sign of residuals

Residual summaries

MVA exp index

Future developments Develop quality assessment measures for routine use in large throughput environment. Assess relationships among various QA measures. Develop diagnostics to assign causes to departures from quality standards. Other applications - Identify non-performing or cross-hybridizing probes, qualify probe sets.

References 1.New Statistical Algorithms for Monitoring Gene Expression on GeneChip® Probe Arrays, Affymetrix technical report. 2. Array Design for the GeneChip® Human Genome U133 Set, Affymetrix technical note. 3.Discussion on Background, Ben Bolstad. 4.Bolstad BM, et. al. (2003), A comparison of normalization methods for high density oligonucleotide array data basedon variance and bias.Bioinformatics Jan 22;19(2): Irizarry, R. et.al (2003) Summaries of Affymetrix GeneChip probe level data, Nucleic Acids Research, 2003, Vol. 31, No. 4 e15 6.Irizarry, R. et. al. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, in press. 7.

Background correction Background correction - to correct for differential background due to experimental processing effects and to put the estimated differential expression on a proper scale. Normalization – to correct for systematic differences in the distribution of probe intensities