Analyzing Factorially designed microarray experiments Scholtens, D. et al. Journal of Multivariate Analysis, to appear Presented by M. Carme Ruíz de Villa.

Slides:



Advertisements
Similar presentations
Bayesian mixture models for analysing gene expression data Natalia Bochkina In collaboration with Alex Lewin, Sylvia Richardson, BAIR Consortium Imperial.
Advertisements

Linear Models for Microarray Data
Designing Experiments: Sample Size and Statistical Power Larry Leamy Department of Biology University of North Carolina at Charlotte Charlotte, NC
Statistical Modeling and Data Analysis Given a data set, first question a statistician ask is, “What is the statistical model to this data?” We then characterize.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Naveen K. Bansal and Prachi Pradeep Dept. of Math., Stat., and Comp. Sci. Marquette University Milwaukee, WI (USA)
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
How many transcripts does it take to reconstruct the splice graph? Introduction Alternative splicing is the process by which a single gene may be used.
Part I – MULTIVARIATE ANALYSIS
Differentially expressed genes
Gene Regulatory Networks - the Boolean Approach Andrey Zhdanov Based on the papers by Tatsuya Akutsu et al and others.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Statistics for Microarrays
5-3 Inference on the Means of Two Populations, Variances Unknown
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 5 – Testing for equivalence or non-inferiority. Power.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Multiple Testing in Microarray Data Analysis Mi-Ok Kim.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that.
ANALYSIS OF VARIANCE (ANOVA) BCT 2053 CHAPTER 5. CONTENT 5.1 Introduction to ANOVA 5.2 One-Way ANOVA 5.3 Two-Way ANOVA.
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 21 The Simple Regression Model.
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
1 Estimation of Gene-Specific Variance 2/17/2011 Copyright © 2011 Dan Nettleton.
Microarray Data Analysis The Bioinformatics side of the bench.
9-1 Copyright © 2016 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Empirical Bayes Analysis of Variance Component Models for Microarray Data S. Feng, 1 R.Wolfinger, 2 T.Chu, 2 G.Gibson, 3 L.McGraw 4 1. Department of Statistics,
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
1 Introduction to Randomization Tests 3/7/2011 Copyright © 2011 Dan Nettleton.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Hypothesis Testing  Test for one and two means  Test for one and two proportions.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Comparing Three or More Means
Inference about the Slope and Intercept
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Inference about the Slope and Intercept
1 Department of Engineering, 2 Department of Mathematics,
ppGpp Controls Global Gene Expression in Light and in Darkness in S
Introduction to Randomization Tests
Elements of a statistical test Statistical null hypotheses
Statistical Analysis and Design of Experiments for Large Data Sets
Volume 14, Issue 7, Pages (February 2016)
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

Analyzing Factorially designed microarray experiments Scholtens, D. et al. Journal of Multivariate Analysis, to appear Presented by M. Carme Ruíz de Villa & Alex Sánchez

Introduction

Complexity of genomic data The functioning of cells is a complex and highly structured process Tools are being developed that allow us to explore it in a multitude of ways Many of these tools rely on the results of microarray expression experiments

Genes interact … Treatments are applied in living dynamic cells mRNA abundance is affected by transcription factors, protein complexes, methylation, etc… Gene 1Gene 5Gene 4Gene 3Gene 2 P active P DNA protein inactive transcription factor protein kinase protein phosphatase transcription factor

The holy grial The holy grial of functional genomics is the reconstruction of genetic networks (Wagner 2001) (We claim that) Factorial experiments are simple to perform and can help to reach this goal if a proper design and analysis is performed

Factorially designed experiments for microarrays We can obtain expression data on the balanced application of the factors, under the four conditions

Many studies are meant to pinpoint the perturbation of genetic networks by combinations of factors Practicality may lead to select genes of interest according to multiple pairwise fold change values without exploiting the use of replicates or modeling to assess statistical significance

Biologically interpretable and statistically reasonable models are necessary to take the most of the experiment and make questions of interest answerable

The experiment

Targets A target of a factor is a gene whose expression ([mRNA]) is altered by the presence of the factor a primary target is a target that is directly affected by the factor a secondary target is a target whose expression is altered only via the effects of some other gene (can be traced back to one or more primary targets)

Experimental questions Experiment on cells from an estrogen receptor positive human breast cancer cell lines (MCF-7) is performed. Questions of interest Which genes are targets of estrogen? Can we differentiate between primary and secondary targets?

Experimental design MCF-7 cells: ER+ breast cancer cell line Biologically independent replicates of each treatment condition in a 2x2 factorial experiment (8 samples). Factor 1: estrogen (ES) Upon binding to ES, ER acts as a transcription factor for certain genes Factor 2: cyclohexamide (CX) Universal translation inhibitor, i.e., mRNA can be transcribed, but it is not translated into protein mRNA abundance was measured using Affymetrix HGU95Av2 microarrays

Answering the questions … We identify as targets all genes whose expression of mRNA is affected by the application of ES A target can be either primary or secondary primary if ES directly affects expression of mRNA secondary if mRNA production is affected by some other gene (can be traced back to a primary target)

Different scenarios The presence of ES and/or CX can affect different targets in different ways Several simplified scenarios considering some possibilities are shown below

Scenario 1

Scenario 3

Statistical models

The linear model Assume the following linear model for the observed expression value (possibly on transformed data): i indexes chips and g indexes genes x 1 indicates the presence of ES and x 2 indicates the presence of CX

The meaning of the model None y ig =  g +  ig CX only y ig =  g  +  CX,g+  ig ES only y ig =  g  +  ES,g+  ig ES and CX y ig =    CX,g +  ES,g +  CX:ES,g+  ig

Inference Assuming normality (which arises from log-transformation) linear models theory can be applied to Obtain unbiased and efficient estimates of  ES,  CX and  ES:CX. Obtain measures of precision for estimates Perform hypothesis testing

Parameters interpretation  ES interpreted as the effect of ES genes for which  ES is different from zero are potential targets not all targets will have  ES different from zero  CX interpreted as the effect due to CX if  CX is different from zero  production of mRNA is translationally regulated  ES:CX interpreted as “what is left” after considering each main effect separately

Parameter values for scenario 1 mRNA A mRNA B  CX = 0  ES > 0  ES:CX = 0< 0

Parameter values for scenario 3 mRNA A mRNA B  CX < 0> 0  ES < 0> 0  ES:CX < 0

ES target identification A gene identified as an ES target if  ES  0 or  CX:ES  0, that is if the hypothesis H 0 :  ES =  CX:ES  0 is rejected If a gene is a ES target, then it is A primary ES target if  ES +  CX:ES  0 or A secondary ES target if  ES +  CX:ES = 0 This can be decided on rejecting or accepting the hypothesis H 0 :  ES +  CX:ES = 0

Multiple testing (1) The hypothesis H 0 :  ES =  CX:ES  0 is performed individually on thousands of genes  multiple testing adjustment required. Control of the false discovery rate (FDR) seems more appropriate for microarray data than other procedures.

Multiple Testing (2) # not rej# rejectedtotals # true HUV (False +)m0m0 # false HT (False -)Sm1m1 totalsm - RRm * Per-comparison = E(V)/m * Family-wise = p(V ≥ 1) * Per-family = E(V) * False discovery rate = E(V/R)

Multiple testing (3) The method applied consists of controlling the FDR so that its is guaranteed that this won’t be higher than a given threshold. The method is conservative and tends to give longer lists of genes A rejected hypothesis indicates an ES target  We can interpret the FDR as the proportion of falsely identified ES targets

Outlier detection Usually complicated in factorial experiments The residuals from the fit of the linear model must satisfy a number of constraints and hence are not suitable for outlier detection However, outlier detection is important since the presence of outliers will inflate the estimated variance and hence decrease our ability to detect significant effects

Outliers

Outlier Detection (1) The replicate structure of the experimental design is used to locate single outliers in the data set. The algorithm is based on differences between the replicate expression values that are larger than expected Assuming normality, a test statistic which follows an F distribution is derived

Outlier Detection (2) This method only identifies pairs with large differences, not the single outlier itself. Once pairs are identified, single outliers are identified if one of the tagged replicates falls outside the range: (med e -4*mad e, med e +4*mad e )

Gene selection algorithm (1) 1. Average the replicate observations and exclude any genes with a maximum average less than 100 (using the PM-only model for gene expression in dChip). Remove all Affymetrix control sequences 2. Apply any necessary transformations to satisfy Normality, then test for single outliers. If outliers are identified, remove them from the data set. 3. Fit the linear model

Gene selection algorithm (2) 4. Test H 0Est :  ES =  CX:ES  0 for each gene. 5. Reject H 0Est for the genes with the lowest resultant p-values using a FDR of Call these genes ES targets. 6. For the ES targets, test H 0pt :  ES +  CX:ES = Call genes with p-values<0.01 for the test of H 0pt primary ES targets. 2. Call the remaining ES target genes secondary ES targets.

Results (1) Primary targets  ES  0 or  CX:ES  0  ES +  CX:ES  0

Results (2) Secondary targets  ES  0 or  CX:ES  0  ES +  CX:ES  0

Conclusions For gene selection using data from factorial designed microarray studies, linear models offer natural paradigm for analysis so long as careful consideration is given to the interpretation of the model parameters. The use of CX in this experiment is one example of a treatment that allows for the identification of primary and secondary ES targets.

Conclusions (2) For experiments with more treatments of interest, fractional factorial designs may be applicable. The candidate genes that are selected using linear models would serve as good candidates for network reconstruction algorithms.

Acknowledgments Special thanks to Denise Scholtens and Robert Gentleman, Biostatistics, Harvard U. for making their materials available

Disclaimer The goal of this presentation is to discuss the contents of the paper indicated in the title Copyrighted images have been taken from the corresponding journals or from slide shows found in internet with the only goal to facilitate the discussion All merit for them has to be attributed to the authors of the papers or the slide shows and we wish to thank them for making them available