Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analyzing Factorially designed microarray experiments Scholtens, D. et al. Journal of Multivariate Analysis, to appear Presented by M. Carme Ruíz de Villa.

Similar presentations


Presentation on theme: "Analyzing Factorially designed microarray experiments Scholtens, D. et al. Journal of Multivariate Analysis, to appear Presented by M. Carme Ruíz de Villa."— Presentation transcript:

1 Analyzing Factorially designed microarray experiments Scholtens, D. et al. Journal of Multivariate Analysis, to appear Presented by M. Carme Ruíz de Villa & Alex Sánchez

2 Introduction

3 Complexity of genomic data The functioning of cells is a complex and highly structured process Tools are being developed that allow us to explore it in a multitude of ways Many of these tools rely on the results of microarray expression experiments

4 Genes interact … Treatments are applied in living dynamic cells mRNA abundance is affected by transcription factors, protein complexes, methylation, etc… Gene 1Gene 5Gene 4Gene 3Gene 2 P active P DNA protein inactive transcription factor protein kinase protein phosphatase transcription factor

5 The holy grial The holy grial of functional genomics is the reconstruction of genetic networks (Wagner 2001) (We claim that) Factorial experiments are simple to perform and can help to reach this goal if a proper design and analysis is performed

6 Factorially designed experiments for microarrays We can obtain expression data on the balanced application of the factors, under the four conditions

7 Many studies are meant to pinpoint the perturbation of genetic networks by combinations of factors Practicality may lead to select genes of interest according to multiple pairwise fold change values without exploiting the use of replicates or modeling to assess statistical significance

8 Biologically interpretable and statistically reasonable models are necessary to take the most of the experiment and make questions of interest answerable

9 The experiment

10 Targets A target of a factor is a gene whose expression ([mRNA]) is altered by the presence of the factor a primary target is a target that is directly affected by the factor a secondary target is a target whose expression is altered only via the effects of some other gene (can be traced back to one or more primary targets)

11 Experimental questions Experiment on cells from an estrogen receptor positive human breast cancer cell lines (MCF-7) is performed. Questions of interest Which genes are targets of estrogen? Can we differentiate between primary and secondary targets?

12 Experimental design MCF-7 cells: ER+ breast cancer cell line Biologically independent replicates of each treatment condition in a 2x2 factorial experiment (8 samples). Factor 1: estrogen (ES) Upon binding to ES, ER acts as a transcription factor for certain genes Factor 2: cyclohexamide (CX) Universal translation inhibitor, i.e., mRNA can be transcribed, but it is not translated into protein mRNA abundance was measured using Affymetrix HGU95Av2 microarrays

13 Answering the questions … We identify as targets all genes whose expression of mRNA is affected by the application of ES A target can be either primary or secondary primary if ES directly affects expression of mRNA secondary if mRNA production is affected by some other gene (can be traced back to a primary target)

14 Different scenarios The presence of ES and/or CX can affect different targets in different ways Several simplified scenarios considering some possibilities are shown below

15 Scenario 1

16 Scenario 3

17 Statistical models

18 The linear model Assume the following linear model for the observed expression value (possibly on transformed data): i indexes chips and g indexes genes x 1 indicates the presence of ES and x 2 indicates the presence of CX

19 The meaning of the model None y ig =  g +  ig CX only y ig =  g  +  CX,g+  ig ES only y ig =  g  +  ES,g+  ig ES and CX y ig =    CX,g +  ES,g +  CX:ES,g+  ig

20 Inference Assuming normality (which arises from log-transformation) linear models theory can be applied to Obtain unbiased and efficient estimates of  ES,  CX and  ES:CX. Obtain measures of precision for estimates Perform hypothesis testing

21 Parameters interpretation  ES interpreted as the effect of ES genes for which  ES is different from zero are potential targets not all targets will have  ES different from zero  CX interpreted as the effect due to CX if  CX is different from zero  production of mRNA is translationally regulated  ES:CX interpreted as “what is left” after considering each main effect separately

22 Parameter values for scenario 1 mRNA A mRNA B  CX = 0  ES > 0  ES:CX = 0< 0

23 Parameter values for scenario 3 mRNA A mRNA B  CX < 0> 0  ES < 0> 0  ES:CX < 0

24 ES target identification A gene identified as an ES target if  ES  0 or  CX:ES  0, that is if the hypothesis H 0 :  ES =  CX:ES  0 is rejected If a gene is a ES target, then it is A primary ES target if  ES +  CX:ES  0 or A secondary ES target if  ES +  CX:ES = 0 This can be decided on rejecting or accepting the hypothesis H 0 :  ES +  CX:ES = 0

25 Multiple testing (1) The hypothesis H 0 :  ES =  CX:ES  0 is performed individually on thousands of genes  multiple testing adjustment required. Control of the false discovery rate (FDR) seems more appropriate for microarray data than other procedures.

26 Multiple Testing (2) # not rej# rejectedtotals # true HUV (False +)m0m0 # false HT (False -)Sm1m1 totalsm - RRm * Per-comparison = E(V)/m * Family-wise = p(V ≥ 1) * Per-family = E(V) * False discovery rate = E(V/R)

27 Multiple testing (3) The method applied consists of controlling the FDR so that its is guaranteed that this won’t be higher than a given threshold. The method is conservative and tends to give longer lists of genes A rejected hypothesis indicates an ES target  We can interpret the FDR as the proportion of falsely identified ES targets

28 Outlier detection Usually complicated in factorial experiments The residuals from the fit of the linear model must satisfy a number of constraints and hence are not suitable for outlier detection However, outlier detection is important since the presence of outliers will inflate the estimated variance and hence decrease our ability to detect significant effects

29 Outliers

30 Outlier Detection (1) The replicate structure of the experimental design is used to locate single outliers in the data set. The algorithm is based on differences between the replicate expression values that are larger than expected Assuming normality, a test statistic which follows an F distribution is derived

31 Outlier Detection (2) This method only identifies pairs with large differences, not the single outlier itself. Once pairs are identified, single outliers are identified if one of the tagged replicates falls outside the range: (med e -4*mad e, med e +4*mad e )

32 Gene selection algorithm (1) 1. Average the replicate observations and exclude any genes with a maximum average less than 100 (using the PM-only model for gene expression in dChip). Remove all Affymetrix control sequences 2. Apply any necessary transformations to satisfy Normality, then test for single outliers. If outliers are identified, remove them from the data set. 3. Fit the linear model

33 Gene selection algorithm (2) 4. Test H 0Est :  ES =  CX:ES  0 for each gene. 5. Reject H 0Est for the genes with the lowest resultant p-values using a FDR of 0.01. Call these genes ES targets. 6. For the ES targets, test H 0pt :  ES +  CX:ES = 0. 1. Call genes with p-values<0.01 for the test of H 0pt primary ES targets. 2. Call the remaining ES target genes secondary ES targets.

34 Results (1) Primary targets  ES  0 or  CX:ES  0  ES +  CX:ES  0

35 Results (2) Secondary targets  ES  0 or  CX:ES  0  ES +  CX:ES  0

36 Conclusions For gene selection using data from factorial designed microarray studies, linear models offer natural paradigm for analysis so long as careful consideration is given to the interpretation of the model parameters. The use of CX in this experiment is one example of a treatment that allows for the identification of primary and secondary ES targets.

37 Conclusions (2) For experiments with more treatments of interest, fractional factorial designs may be applicable. The candidate genes that are selected using linear models would serve as good candidates for network reconstruction algorithms.

38 Acknowledgments Special thanks to Denise Scholtens and Robert Gentleman, Biostatistics, Harvard U. for making their materials available

39 Disclaimer The goal of this presentation is to discuss the contents of the paper indicated in the title Copyrighted images have been taken from the corresponding journals or from slide shows found in internet with the only goal to facilitate the discussion All merit for them has to be attributed to the authors of the papers or the slide shows and we wish to thank them for making them available


Download ppt "Analyzing Factorially designed microarray experiments Scholtens, D. et al. Journal of Multivariate Analysis, to appear Presented by M. Carme Ruíz de Villa."

Similar presentations


Ads by Google