Summarization of Oligonucleotide Expression Arrays BIOS 691-803 Winter 2010.

Slides:



Advertisements
Similar presentations
NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.
Advertisements

Quality control of Affymetrix arrays. What can go wrong? RNA degradation (before hyb) –3’/5’ Dirty samples –background, % present calls Uneven hybridizations.
Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG) Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG) Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center.
27/06/2005ISMB 2005 GenXHC: A Probabilistic Generative Model for Cross- hybridization Compensation in High-density Genome-wide Microarray Data Joint work.
Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS Spring 2010 Dr Mark Reimers.
Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.
Microarray Normalization
Zhongxue Chen, Monnie McGee, Qingzhong Liu and Richard Scheuermann
Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of.
Introduction to Affymetrix Microarrays
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Statistical Methods in Microarray Data Analysis Mark Reimers, Genomics and Bioinformatics, Karolinska Institute.
Getting the numbers comparable
Probe Level Analysis of AffymetrixTM Data
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Preprocessing Methods for Two-Color Microarray Data
Low-Level Analysis and QC Regional Biases Mark Reimers, NCI.
1 Preprocessing for Affymetrix GeneChip Data 1/18/2011 Copyright © 2011 Dan Nettleton.
SNP chips Advanced Microarray Analysis Mark Reimers, Dept Biostatistics, VCU, Fall 2008.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
1 Models and methods for summarizing GeneChip probe set data.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Affymetrix GeneChip Data Analysis Chip concepts and array design Improving intensity estimation from probe pairs level Clustering Motif discovering and.
Combining the strengths of UMIST and The Victoria University of Manchester Propagating Measurement Uncertainty in Microarray Data Analysis Magnus Rattray.
Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Division of Human Cancer Genetics Ohio State University.
GeneChips and Microarray Expression Data
Gene expression array and SNP array
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
Summaries of Affymetrix GeneChip probe level data By Rafael A. Irizarry PH 296 Project, Fall 2003 Group: Kelly Moore, Amanda Shieh, Xin Zhao.
Microarray Preprocessing
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
Inference for regression - Simple linear regression
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
Lecture 22 Introduction to Microarray
Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Assessing expression data quality in high-density oligonucliotide arrays.
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
Scenario 6 Distinguishing different types of leukemia to target treatment.
Lo w -Level Analysis of Affymetrix Data Mark Reimers National Cancer Institute Bethesda Maryland.
A Short Overview of Microarrays Tex Thompson Spring 2005.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits)
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Statistics for Differential Expression Naomi Altman Oct. 06.
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Microarray Data Analysis The Bioinformatics side of the bench.
EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
Statistical Analyses of High Density Oligonucleotide Arrays Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Gene expression  Introduction to gene expression arrays Microarray Data pre-processing  Introduction to RNA-seq Deep sequencing applications RNA-seq.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Introduction to Oligonucleotide Microarray Technology
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
Introduction to Affymetrix GeneChip data
Copy-number estimation using Robust Multichip Analysis - Supplementary materials for the aroma.affymetrix lab session Henrik Bengtsson & Terry Speed Dept.
Getting the numbers comparable
Pre-processing AFFY data
Presentation transcript:

Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010

What is Summarization? Some expression arrays (Affymetrix, Nimblegen) use multiple probes to target a single transcript – a ‘probe set’ Typically probes have different fold changes between any two samples How to effectively summarize the information in a probe set?

Many Probes for One Gene GeneSequence Multiple oligo probes Perfect Match Mismatch5´3´ How to combine signals from multiple probes into a single gene abundance estimate?

Probe Variation Individual probes don’t agree on fold changes Probes vary by two orders of magnitude on each chip –CG content is most important factor in signal strength Signal from 16 probes along one gene on one chip

Probe Measure Variation Typical probes are two orders of magnitude different! CG content is most important factor RNA target folding also affects hybridization 3x10 4 0

Bioinformatics Issues Probes may not map accurately SNP’s in probes Affymetrix places most probes in 3’UTR of genes –Alternate Poly-A sites mean that some probe targets may really be less common than others

Probe Mapping Early builds of the genome often confused regions or genes and their complements Probe sets at right represent probe sets for rRNA gene and its complement

Alternate Poly-Adenylation Sites Poly-A marks mRNA ‘tail’ Many genes have alternatives 3’ UTR may be longer or shorter

Alternate Polyadenylation of MID1

Many Approaches to Summarization Affymetrix MicroArray Suite; PLiER dChip - Li and Wong, HSPH Bioconductor: –RMA - Bolstad, Irizarry, Speed, et al –affyPLM – Bolstad –gcRMA – Wu Physical chemistry models – Zhang et al Factor model Probe-weighting

Critique of Averaging (MAS5) Not clear what an average of different probes should mean Tukey bi-weight can be unstable when data cluster at either end – frequently the conditions here No ‘learning’ based on cross-chip performance of individual probes

Motivation for multi-chip models: Probe level data from spike-in study ( log scale ) note parallel trend of all probes Courtesy of Terry Speed

Model for Probe Signal Each probe signal is proportional to –i) the amount of target sample – a –ii) the affinity of the specific probe sequence to the target – f NB: High affinity is not the same as Specificity –Probe can give high signal to intended target and also to other transcripts a1a1 a2a2 Probes chip 1 chip 2 f 1 f 2 f 3

Multiplicative Model For each gene, a set of probes p 1,…,p k Each probe p j binds the gene with efficiency f j In each sample there is an amount a i. Probe intensity should be proportional to f j x a i Always some noise!

Robust Linear Models Criterion of fit –Least median squares –Sum of weighted squares –Least squares and throw out outliers Method for finding fit –High-dimensional search –Iteratively re-weighted least squares –Median Polish

For each probe set, take log of PM ij = a i f j : then fit the model: where caret represents “after pre-processing” Fit this additive model by iteratively re- weighted least-squares or median polish Bolstad, Irizarry, Speed – (RMA) Critique: Model assumes probe noise is constant (homoschedastic) on log scale

Comparing Measures 20 replicate arrays – variance should be small Standard deviations of expression estimates on arrays arranged in four groups of genes by increasing mean expression level Green: MAS5.0; Black: Li-Wong; Blue, Red: RMA Courtesy of Terry Speed

Background 25-mers are prone to cross-hybridization MM > PM for about 1/3 of all probes Cross-hybridization varies with GC content Signal intensity varies with cross-hybe

The gcRMA Approach Estimate non-specific binding using either: –True null assay (non- homologous RNA) –Estimates from MM Subtract background before normalization and fitting model

Evaluating gcRMA On AffyComp data sets, gcRMA wins –Replicates with 14 spike-ins done by Affy Many investigators get crappy results (and don’t write it up) gcRMA does very well on highly expressed genes, not nearly so well on less expressed genes Gharaibeh et al. BMC Bioinformatics :452

Factor Model Assume relation between p observations x and true value z : x = z +  where  i are independent Use factor analytic methods to estimate –Depends on assuming z ~ Normal –Differs from RMA in relaxing assumption of IID errors – some probes can have more random error than others

Weighting Probes It is clear that some probes are more reliable than others How to assess this in a simple fashion? If a gene really changes across arrays, then a responsive probe will change more than a noisy probe Weight by relative ranges Best performance on AffyComp!

Summary and Evaluation No one best solution for all situations gcRMA and DFW seem to do very well on AffyComp data –May need weights for DFW by tissue Leading methods seem to rely on probe weighting