NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

Slides:



Advertisements
Similar presentations
Experiment Design for Affymetrix Microarray.
Advertisements

Bioconductor in R with a expectation free dataset Transcriptomics - practical 2012.
Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.
Microarray Normalization
Zhongxue Chen, Monnie McGee, Qingzhong Liu and Richard Scheuermann
Introduction to Affymetrix Microarrays
Statistical Methods in Microarray Data Analysis Mark Reimers, Genomics and Bioinformatics, Karolinska Institute.
Introduction to DNA Microarray Technology Steen Knudsen, April 2005.
Getting the numbers comparable
DNA microarray and array data analysis
Probe Level Analysis of AffymetrixTM Data
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Low-Level Analysis and QC Regional Biases Mark Reimers, NCI.
1 Preprocessing for Affymetrix GeneChip Data 1/18/2011 Copyright © 2011 Dan Nettleton.
GCB/CIS 535 Microarray Topics John Tobias November 3 rd, 2004.
Data Extraction cDNA arrays Affy arrays. Stanford microarray database.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Introduce to Microarray
An Introduction to Logistic Regression
Affymetrix GeneChip Data Analysis Chip concepts and array design Improving intensity estimation from probe pairs level Clustering Motif discovering and.
GeneChips and Microarray Expression Data
Summaries of Affymetrix GeneChip probe level data By Rafael A. Irizarry PH 296 Project, Fall 2003 Group: Kelly Moore, Amanda Shieh, Xin Zhao.
Microarray Preprocessing
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
CDNA Microarrays MB206.
Data Type 1: Microarrays
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
Dr Andrew Harrison Departments of Mathematical Sciences and Biological Sciences University of Essex Looking for signals in tens of thousands.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 6: Case Study.
Assessing expression data quality in high-density oligonucliotide arrays.
Microarray - Leukemia vs. normal GeneChip System.
Bioconductor in R with a expectation free dataset Transcriptomics - practical 2014.
Lo w -Level Analysis of Affymetrix Data Mark Reimers National Cancer Institute Bethesda Maryland.
Intermediate 2 Software Development Process. Software You should already know that any computer system is made up of hardware and software. The term hardware.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits)
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Linear Models One-Way ANOVA. 2 A researcher is interested in the effect of irrigation on fruit production by raspberry plants. The researcher has determined.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.
Microarray Data Analysis The Bioinformatics side of the bench.
Statistical Analyses of High Density Oligonucleotide Arrays Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Gene expression  Introduction to gene expression arrays Microarray Data pre-processing  Introduction to RNA-seq Deep sequencing applications RNA-seq.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
1 Research Methods in Psychology AS Descriptive Statistics.
Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Introduction to Oligonucleotide Microarray Technology
Arrays How do they work ? What are they ?. WT Dwarf Transgenic Other species Arrays are inverted Northerns: Extract target RNA YFG Label probe + hybridise.
Using ArrayStar with a public dataset
Statistics: The Z score and the normal distribution
The Basics of Microarray Image Processing
Getting the numbers comparable
Normalization for cDNA Microarray Data
Lecture 3 From Images to Data
Pre-processing AFFY data
Chapter 9 Test for Independent Means Between-Subjects Design
Presentation transcript:

NASC Normalisation and Analysis of the Affymetrix Data David J Craigon

NASC What I am not going to talk about General microarray topics Biology

NASC The introduction

NASC Affymetrix workflow Biological sample of some sort AmplifyExtract mRNALabel and Fragment Hybridise to a chipScan chipFind features in scan Analyse down to one number per gene

NASC What do we want to find out? We want to find out how much mRNA of each type was in the original sample

NASC Biological sample of some sort AmplifyExtract mRNALabel and Fragment Hybridise to a chipScan chipFind features in scan Analyse down to one number per gene Each of these steps need to be proportional

NASC Biological sample of some sort AmplifyExtract mRNALabel and Fragment Hybridise to a chipScan chipFind features in scan Analyse down to one number per gene This talk is about this bit

NASC Affymetrix Chips On an Affymetrix chip each oligo takes up a square The RNA extracted from the plant is first amplified. Then is labelled. This allows the scanner to see it. The RNA is then hybridised to the array. Matching RNA for that square sticks to the square, and can be seen by the scanner. By observing the intensity of a square, the amount of RNA bound to that oligo can be calculated

NASC Design of the oligos Series of oligos designed for one gene Each oligo comes in two versions… 53

NASC Match and mismatch The exact match is a section of the mRNA sequence you wish to probe for The mismatch is identical except for one base difference from its exact match counterpart, and is used to calculate a background. There are typically 11 probe pairs scattered around the chip- called a probe set. By combining the expression values for a probe set, a value for the expression of mRNA can be found.

NASC EXP, DAT, CEL, CHP files EXP file- experiment file DAT file- the picture- like a TIFF. CEL file- a unnormalised number for each probe. CHP file- one number for each probeset

NASC What do you think of it so far? So far… What we want to find out is the amount of each mRNA in the starting sample. The mRNA hybridises to a series of probes. We can get a number for each probe from the CEL file.

NASC The rest of this talk We are going to go through four distinct ways of determining Signal values from CEL file data MAS 4 MAS 5 MBEI (dChip) RMA

NASC Mismatch probes in detail

NASC All about mismatch probes ATGCTGTACAATCGCTTGATACTGG ATGCTGTACAATAGCTTGATACTGG Mismatch probe: Target sequence: Perfect match probe:

NASC Why do we have mismatch probes? Mismatch probes (MM) are trying to detect background. The mismatch probes are supposed to detect things that are close but not an exact match. It is assumed that these things also bind to the perfect match (PM), erroneously.

NASC Yes folks, its Expression Method No 1! The original method that was used by MAS 4

NASC MAS 4 Algorithm For a probe set: A is the set of probes you havent thrown away due to being outliers j=0 to the number of probesets In English, the formula is very simple- throw away the outliers, then simply average the differences between PM and MM of the probes youve got left.

NASC Problems with the MAS4 algorithm Better fit with log(PM) preferred

NASC Expression Method No 2! MAS 5 method. Still used by GCOS- the current Affymetrix supplied method.

NASC Normalisation Procedure Before any work is done with the CEL data, the CEL file is normalised. Corrects for intra-chip differences

NASC Normalisation Procedure Divides the chip into K zones (by default, 16 zones) Select the lowest 2% of probes (of any description) Assume these are switched off

NASC Normalisation Procedure Calculate Mean, SD of these switched off probes for each section. Used as background. Each points local background weighted difference between each zone Subtract background from each probe.

NASC MAS 5 Algorithm For a probe set: Tukeys Biweight is an average that minimises the effect of outliers. IM is the ideal mismatch. This is the same as the MM intensity, except in the case where the MM is greater than the PM, in which case a new MM values is calculated based on other probes nearby

NASC

MAS4 to MAS5 comparison

NASC Signal Normalisation To try to eliminate chip-to-chip variability. Sort the signal values and remove the top and bottom 2% Calculate a scaling factor to adjust this middle 96%s mean to 100 (configurable, and variable) Multiply all signal values by the scaling factor Affymetrix state that scaling factors should be similar for arrays to be comparable

NASC Expression Method No 3! The MBEI method of Li and Wong. Found in dChip, so often known as the dChip method.

NASC Observation

NASC Observation The probes are vastly variable in effectiveness Li and Wong point out that the difference between probes is much greater than the difference between arrays! They contend that any proper model should take this into account.

NASC MBEI model

NASC MBEI model Baseline response due to noise Expression value (the thing we are interested in) Rate of increase of PM probe as signal increases (separate for each probe) Rate of increase of MM probe as signal increases (really? See later) Error term

NASC Model is fitted over all chips Processes an entire experiment at once Model is fitted using residual sum of squares In their paper on the subject they talk a lot about how you can use this model to detect outliers, scratches on the array, etc. Im not going to talk about that.

NASC RMA paper observations

NASC A spiked in experiment from the RMA paper It would be useful if we had an experiment where we knew the answer Run a series of experiments with a fixed background, but spike in some artificial RNA for a series of probes, at different concentrations.

NASC Mismatch probes Mismatch probes are supposed to calculate what similar things hybridise to probes, to detect background for PM probes. The background should be at a relatively low level most of the time…

NASC Yikes! Actually MM>PM between 33% and 40% of the time!

NASC Mismatch probes Mismatch probes are supposed to calculate what similar things hybridise to probes, to detect background for PM probes. The amount of this stuff shouldnt depend on how much interesting RNA there is about…

NASC Man the lifeboats!

NASC Some observations from the RMA paper … perfect match probes appear to be additive (in the log scale)

NASC The amount of signal does affect mismatch probes. Clearly some of the useful mRNA is hybidising to the MM probes. This kind of shock has led to some people abandoning the use of MM probes altogether!

NASC Whats going on?

NASC Perfect match probes … in RMA, in the log scale, they assume that probe effects are effectively additive

NASC How RMA (roughly) works

NASC RMA process Normalise array Fit model

NASC Normalisation procedure involves adusting distributions

NASC RMA process Normalise array Fit model

NASC Fit model Correct background using estimate from all mismatch probes for each array. Fit model: Log scale expression value Additive probe affinitive effect for this probe over all slides Background corrected PM value

NASC In summary then… There are various ways you can get from a CEL file to expression estimates. These models are derived by considering the behaviour of PM and MM probes Both dChip and RMA show better results than the standard Affy algorithm MM probes in particular behave contrary to how you would expect.

NASC Enough theory- how do you actually do these things? The MAS5 algorithm can be performed using (erm) MAS5! dChip is a piece of software that will be making an appearance later this afternoon, and can do the MBEI algorithm The RMA authors have a piece of software called RMAExpress, which does RMA for Windows. All of these algorithms can be done using the Bioconductor package in R.

NASC