Differential Gene Expression with the limma package

Slides:



Advertisements
Similar presentations
AP Statistics Section 3.2 C Coefficient of Determination
Advertisements

Lecture 17: Tues., March 16 Inference for simple linear regression (Ch ) R2 statistic (Ch ) Association is not causation (Ch ) Next.
Statistical Techniques I EXST7005 Simple Linear Regression.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Definition  Regression Model  Regression Equation Y i =  0 +  1 X i ^ Given a collection of paired data, the regression equation algebraically describes.
Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.
Microarray Normalization
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
NOTES ON MULTIPLE REGRESSION USING MATRICES  Multiple Regression Tony E. Smith ESE 502: Spatial Data Analysis  Matrix Formulation of Regression  Applications.
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
7/2/ Lecture 51 STATS 330: Lecture 5. 7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,
Ch. 14: The Multiple Regression Model building
Linear regression models in matrix terms. The regression function in matrix terms.
Linear Regression/Correlation
Genomic Profiles of Brain Tissue in Humans and Chimpanzees.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
Correlation and Linear Regression
ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION Chapter 3.
Linear Trend Lines = b 0 + b 1 X t Where is the dependent variable being forecasted X t is the independent variable being used to explain Y. In Linear.
Normalization Intro to R Carol Bult The Jackson Laboratory Functional Genomics (BMB550) Spring 2012 February 7, 2012.
Linear Regression James H. Steiger. Regression – The General Setup You have a set of data on two variables, X and Y, represented in a scatter plot. You.
Multivariate Analysis. One-way ANOVA Tests the difference in the means of 2 or more nominal groups Tests the difference in the means of 2 or more nominal.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 19 Linear Patterns.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
1 Example Analysis of an Affymetrix Dataset Using AFFY and LIMMA 4/4/2011 Copyright © 2011 Dan Nettleton.
Simple & Multiple Regression 1: Simple Regression - Prediction models 1.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Simple Linear Regression In the previous lectures, we only focus on one random variable. In many applications, we often work with a pair of variables.
Lecture 6 Design Matrices and ANOVA and how this is done in LIMMA.
Least Squares Regression.   If we have two variables X and Y, we often would like to model the relation as a line  Draw a line through the scatter.
^ y = a + bx Stats Chapter 5 - Least Squares Regression
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 Simple Linear Regression and Correlation Least Squares Method The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES.
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Chapter 5 Lesson 5.2 Summarizing Bivariate Data 5.2: LSRL.
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
OLS Regression What is it? Closely allied with correlation – interested in the strength of the linear relationship between two variables One variable is.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
PreCalculus 1-7 Linear Models. Our goal is to create a scatter plot to look for a mathematical correlation to this data.
Lab 5 Unsupervised and supervised clustering Feb 22 th 2012 Daniel Fernandez Alejandro Quiroz.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Lecture #26 Thursday, November 17, 2016 Textbook: 14.1 and 14.3
LEAST – SQUARES REGRESSION
Statistics 101 Chapter 3 Section 3.
Linear Regression Special Topics.
CHAPTER 3 Describing Relationships
Regression.
Simple Linear Regression - Introduction
AP Stats: 3.3 Least-Squares Regression Line
LEAST – SQUARES REGRESSION
Linear Regression/Correlation
Regression Models - Introduction
^ y = a + bx Stats Chapter 5 - Least Squares Regression
Least-Squares Regression
M248: Analyzing data Block D UNIT D2 Regression.
Least-Squares Regression
Correlation and Regression
Least-Squares Regression
Checking the data and assumptions before the final analysis.
Regression Models - Introduction
Presentation transcript:

Differential Gene Expression with the limma package 20 March 2012 Functional Genomics

Linear regression Fit a straight line through a set of points such that the distance from the points to the line is minimized The slope of the line is adjusted to minimize the squares of the vertical distance of the points from the line. The line represents the model, the distances between the points and the line are the residuals. The simple regression minimizes the sum of the squares of the residuals…this is the method of least squares.

Y = Y0 + β Z Assume you have a data set of gene expression in tumor vs normal tissue. This is a simple mathematical expression of what is being calculated for a linear model. Y is expression of gene X Y0 is mean expression of normal tissue, t β is difference of expression of normal, compared to tumor, tissue Z is group variable (0 for normal; 1 for tissue)

Multivariate linear regression Y = Y0 + β Z + ϒ W Suppose you have another variable…such as age…you can add that right in! Y is expression of gene X Y0 is mean expression of normal t β is difference of expression of normal, compared to tumor, tissue Z is group variable (0 for normal; 1 for tissue) ϒ = age affect W = age group

Multivariate linear regression Y = Y0 + β Z + ϒW + δZ*W You can ask for differences in gene expression due to tissue, due to age, and due to an age by tissue interaction. Y is expression of gene X Y0 is mean expression of normal t β is difference of expression of normal, compared to tumor, tissue Z is group variable (0 for normal; 1 for tissue) ϒ = age affect W = age group Add a component to look for age by tissue interaction effects: δZ*W

limma R package for differential gene expression that uses linear modeling for each gene in your data set Expression data will be log-intensity values for Affy data Designed to be used in conjunction with the affy package

Information on limma http://www.statsci.org/smyth/pubs/limma-biocbook-reprint.pdf http://www.bioconductor.org/packages/2.3/bioc/vignettes/limma/inst/doc/usersguide.pdf

limma checklist Assumes you’ve done an experiment and have CEL files (if you’ve done single color Affy arrays) Assumes you have data/information about the arrays (Targets) Assumes you have normalized your data and have an exprSet object

Name FileName Target MT1 MTP1_Ackerman.CEL MT MT2 MTP2_Ackerman.CEL MT WT1 WTP1_Ackerman.CEL WT WT2 WTP2_Ackerman.CEL WT WT3 WTP3_Ackerman.CEL WT This is my targets file for limma using the Ackerman data. Note that I renamed the CEL files compared to what was originally in my home directory.

ExpressionSet object slotNames() new('exprSet', exprs = ...., # Object of class matrix se.exprs = ...., # Object of class matrix phenoData = ...., # Object of class phenoData annotation = ...., # Object of class character description = ...., # Object of class MIAME notes = ...., # Object of class character ) Slots exprs: Object of class "matrix" The observed expression levels. This is a matrix with columns representing patients or cases and rows representing genes. se.exprs: Object of class "matrix" This is a matrix of the same dimensions as exprs which contains standard error estimates for the estimated expression levels. phenoData: Object of class "phenoData" This is an instance of class phenoData containing the patient (or case) level data. The columns of the pData slot of this entity represent variables and the rows represent patients or cases. annotation A character string identifying the annotation that may be used for the exprSet instance. description: Object of class "MIAME". For compatibility with previous version of this class description can also be a "character". The clase characterOrMIAME has been defined just for this. notes: Object of class "character" Vector of explanatory text ExpressionSet object slotNames() http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/Biobase/html/exprSet-class.html

Running limma Need to create an exprSet object using the affy package Or some other method…depends on the array platform Need a design matrix Representation of the different RNA targets which have been hybridized to the array Can have a contrast matrix Uses information in the design matrix to do comparisons of interest Don’t always need a contrast matrix…..

library(affy) library(limma) library(makecdfenv) Array.CDF = make.cdf.env("MoGene-1_0-st-v1.cdf") CELData=ReadAffy() CELData@cdfName="Array.CDF" slotNames(CELData) pData(CELData) eset=rma(CELData) pData(eset) strain=c("MT","MT","MT","WT","WT","WT") design=model.matrix(~factor(strain)) colnames(design)=c("MT","WT") fit=lmFit(eset,design) fit=eBayes(fit) options(digits=2) topTable(fit, coef=2, n=40, adjust="BH")

Time Series

Differential gene expression methods don’t work well for time series Assumption of independence of observations doesn’t hold in time series BETR takes correlations/dependencies into account to detect changes in gene expression that are sustained over time http://bioc.ism.ac.jp/2.5/bioc/html/betr.html http://bioc.ism.ac.jp/2.5/bioc/vignettes/betr/inst/doc/betr.pdf

Running BETR Need a data frame that describes the arrays Need to specify the conditions/contrasts

betr() function usage and arguments

The file describes a three time point time series of diaphragm development. This annotation file has the list of CEL files, associates them with a time point, and indicates which arrays are replicates (must be an event number) In this example, this file is called “samples3.txt” These data ARE available in GEO GSE35243

library(betr) library(affy) library(Biobase) test = read.AnnotatedDataFrame("samples3.txt", sep="\t", quote="") test.data = ReadAffy(phenoData=test) norm.data = rma(test.data) prob.data=betr(eset=norm.data, twoColor=FALSE, twoCondition=NULL, +timepoint=as.numeric(pData(norm.data)$time), +replicate=as.character(pData(norm.data)$rep), alpha=0.05) write.table(prob.data, file=”betr_results.txt”, sep=”\t”)

Next time pbx1 assignment…..find location of the probes in another one of the probesets for zebrafish. Read limma documentation Run limma on your data set Be sure you have your Galaxy account set up