t * max )"> t * max )">

Presentation is loading. Please wait.

Presentation is loading. Please wait.

1-19-061 Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.

Similar presentations


Presentation on theme: "1-19-061 Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic."— Presentation transcript:

1 1-19-061 Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic and the corresponding p-value for the i th gene, i=1,...,T P-value is the probability of observing as extreme or more extreme value of the t-statistic under the “null-distribution” (i.e. the distributions assuming that  i Ctl =  i Nic ) than the one calculated from the data (t * ) The i th gene is "differentially expressed" if we can reject the i th null hypothesis  i Ctl =  i Nic and conclude that  i Ctl   i Nic at a significance level  (i.e. if p i <  ) Type I error is committed when a null-hypothesis is falsely rejected Type II error is committed when a null-hypothesis is not rejected but it is false Experiment-wise Type I Error is committed if any of a set of (T) null hypothesis is falsely rejected If the significance level is chosen prior to conducting experiment, we know that by following the hypothesis testing procedure, we will have the probability of falsely concluding that any one gene is differentially expressed (i.e. falsely reject the null hypothesis) is equal to  Identifying Differentially Expressed Genes

2 1-19-062 Experiment-wise error rate Assuming that individual tests of hypothesis are independent and true: p(Not Committing The Experiment-Wise Error) = p( Not Rejecting H 0 1 AND Not Rejecting H 0 2 AND... AND Not Rejecting H 0 T ) = (1-  )(1-  )...(1-  ) = (1-  ) T p(Committing The Experiment-Wise Error) =1- (1-  ) T If we want to keep the FWER at  level: Sidak’s adjustment:  a = 1-(1-  ) 1/T Bonferroni adjustment:  a =  /T

3 1-19-063 Adjusting p-value Individual Hypotheses: H 0 i :  i Ctl =  i Nic  p i =p(t df > t i * ), i=1,...,T "Composite" Hypothesis: H 0 : {  i Ctl =  i Nic, i=1,...,T}  p=min{p i, i=1,...,T} The composite null hypothesis is rejected if even a single individual hypothesis is rejected Consequently the p-value for the composite hypothesis is equal to the minimum of individual p-values If all tests have the same reference distribution, this is equivalent to p=p(t df > t * max )

4 1-19-064 p(min{p i, i=1,...,T} <  ) = assuming independence=1-[1-  ] T Instead of adjusting the significance level, can adjust all p-values: p i a = 1-[1-p i ] T p i b = p i T Null distribution of the composite p-value

5 1-19-065 This is not good enough Traditional statistical approaches to multiple comparison adjustments which strictly control the experiment-wise error rates are not optimal – probability of rejecting a single true null hypothesis =  Need a balance between the false positive and false negative rates Instead of controlling the probability of generating a single false positive, we control the proportion of false positives

6 1-19-066 False Discovery Rate Using  to test each hypothesis – Expected number of false positives is (m 0 )*  - large if most null hypothesis true and the probability of committing a FWE is approximately 1-(1-  ) (m0) When using  adjusted, the probability of committing FWE is  False Discovery Rate (FDR) is equal to expected V/R We don't know m 0 so multiple comparisons adjustments are usually made assuming m 0 =m Turns out we can control FDR

7 1-19-067 False Discovery Rate Alternatively, adjust p-values as Following decision making procedure will keep FDR below q *

8 1-19-068 The Problem: Identify genes whose expression in a target organ (Lung) of a model organism (Rat) is affected by an environmental toxicant (W) Population: All model organisms of this type (Rats) Sample: 12 randomly selected rats from the population of all rats. (Randomly means that all rats in the population have the equal chance of being selected) Randomization: Randomly select 6 rats to be treated by the toxicant. Randomly is the key word here that allows us to ascribe observed changes to the treatment alone. Prepare samples and extract RNA from all 12 rats Randomly assign labeled RNA to different microarrays Process microarrays in a random order Randomization Issue

9 1-19-069 The Problem: Identify genes whose expression in a target organ (Lung) of a model organism (Mouse) is affected by an environmental toxicant (Nickel) Population: All model organisms of this type (Mice) Sample: 6 randomly selected rats from the population of all rats. (Randomly means that all rats in the population have the equal chance of being selected) Randomization: Randomly select 3 rats to be treated by the toxicant. Randomly is the key word here that allows us to ascribe observed changes to the treatment alone. Prepare samples and extract RNA from all 6 mice Randomly assign labeled RNA to different microarrays Process microarrays in a random order Randomization Issue

10 1-19-0610 limma... is a package for the analysis of microarray data, especially the use of linear models for analyzing designed experiments and the assessment of differential expression. Specially constructed data objects to represent various aspects of microarray data Specially constructed "object methods" for importing, normalizing, displaying and analyzing microarray data All objects and methods are transparent All objects can be accessed and modified outside of limma Unique in the implementation of the empirical Bayes procedure for identifying differentially expressed genes by "borrowing" information from different genes (everything so far has been gene by gene)

11 1-19-0611 Measurement Error Model With Additive Background W=Nickel; C=Ctl There are other models for accounting for the background signal Simple subtraction of the background intensities often introduces additional variability in the observed signal The problem is in the fact that we use a single-observation estimate for  B With this in mind, various strategies have been proposed to pool background information from more than one spot to estimate  B Foreground (F) Background (B) Old Model New Model

12 1-19-0612 6 microarrays, 6 samples (C1,...,C3,W1,...,W3) Randomly assign samples to different microarrays In terms of a single gene, 6 different "spots" Single Channel Microarrays – Each Sample Assigned to a Different Microarray W3W1W2C1C2C3 Proceed with a two-sample t-test as we did so far

13 1-19-0613 3 microarrays, 6 samples (C1,...,C3,W1,...,W3) Randomly select pairs and assign then to different microarrays In terms of a single gene, 3 different "spots" Two-Channel Microarrays – One C and One W Sample Assigned to Each Microarray W3 C1 W2C3W3 C2 Individual samples are no longer "free" to be assigned to any microarray – restriction on the randomization process Measurements are "blocked" within a microarray (terminology) We could still randomly assign samples and not have treatment and the control on each microarray, but this would be unreasonable (arguments to come) Need to use a paired t-test

14 1-19-0614 For a specific gene r i = x iw -x ic = i th difference, i=1,…,3 Paired t-test Differential expression    0 Statistical Model of observed data Estimating parameters Calculating t-statistic t*t* -t * "Null Distribution" is t- distribution with n-1 degrees of freedom

15 1-19-0615 Two-sample t-test vs paired t-test Denominator1.510.04 p-value0.8700.002 Reference Distributiont 2n-2 t n-1

16 1-19-0616 limma Data to import: http://eh3.uc.edu/teaching/cfg/2006/data/07-21-03_MO-S-06-23-03N17-72SV2-3-vs-CSV2-5-B.gpr http://eh3.uc.edu/teaching/cfg/2006/data/07-21-03_MO-S-06-23-03N73-CSV3-3-vs-72SV3-5-A.gpr http://eh3.uc.edu/teaching/cfg/2006/data/07-18-03_MO-S-06-23-03N98-CSV1-3-vs-72SV1-5-B.gpr File descriptions: http://eh3.uc.edu/teaching/cfg/2006/data/NTargets.txt http://eh3.uc.edu/teaching/cfg/2006/data/NTargets.txt Spot descriptions: http://eh3.uc.edu/teaching/cfg/2006/data/SpotTypes.txt http://eh3.uc.edu/teaching/cfg/2006/data/SpotTypes.txt Importing data: source("http://eh3.uc.edu/teaching/cfg/2006/R/LimmaDataImport.R")http://eh3.uc.edu/teaching/cfg/2006/R/LimmaDataImport.R

17 1-19-0617 limma > library(limma) > data.directory<-"http://eh3.uc.edu/teaching/cfg/2006/data/" > targets<-readTargets("http://eh3.uc.edu/teaching/cfg/2006/data/NTargets.txt") > targets array experiment cy3 1 17.0 07-21-03_MO-S-06-23-03N17-72SV2-3-vs-CSV2-5-B.gpr Nic-WT72hr_2 2 73.0 07-21-03_MO-S-06-23-03N73-CSV3-3-vs-72SV3-5-A.gpr Ctl-WT00hr_3 3 98.1 07-18-03_MO-S-06-23-03N98-CSV1-3-vs-72SV1-5-B.gpr Ctl-WT00hr_1 cy5 date 1 Ctl-WT00hr_2 7/21/2003 2 Nic-WT72hr_3 7/21/2003 3 Nic-WT72hr_1 7/18/2003

18 1-19-0618 limma > spottypes<- readSpotTypes("http://eh3.uc.edu/teaching/cfg/2006/data/SpotTypes.txt") > spottypes SpotType ID Name Color 1 cDNA * * black 2 Blank *Blank* * blue 3 Control * *control* red 4 Empty *Empty* * blue 5 empty *empty* * blue >

19 1-19-0619 RGList class > LimmaDataNickel<-read.maimages(files=targets$experiment,source="genepix", path = data.directory, + columns=list(Gf = "F532 Median",Gb ="B532 Median", Rf = "F635 Median", Rb = "B635 Median"), + annotation=c("Name","ID","Block","Row","Column"),wt.fun=wtflags(0)) Read http://eh3.uc.edu/teaching/cfg/2006/data//07-21-03_MO-S-06-23-03N17-72SV2-3-vs-CSV2-5-B.gpr Read http://eh3.uc.edu/teaching/cfg/2006/data//07-21-03_MO-S-06-23-03N73-CSV3-3-vs-72SV3-5-A.gpr Read http://eh3.uc.edu/teaching/cfg/2006/data//07-18-03_MO-S-06-23-03N98-CSV1-3-vs-72SV1-5-B.gpr > attributes(LimmaDataNickel) $names [1] "R" "G" "Rb" "Gb" "weights" "targets" "genes" $class [1] "RGList" attr(,"package") [1] "limma"

20 1-19-0620 RGList class > LimmaDataNickel$R[1:3,] 07-21-03_MO-S-06-23-03N17-72SV2-3-vs-CSV2-5-B 07-21-03_MO-S-06-23-03N73-CSV3-3-vs-72SV3-5-A [1,] 2264 3642 [2,] 303 734 [3,] 140 164 07-18-03_MO-S-06-23-03N98-CSV1-3-vs-72SV1-5-B [1,] 726 [2,] 248 [3,] 120 > LimmaDataNickel$Rb[1:3,] 07-21-03_MO-S-06-23-03N17-72SV2-3-vs-CSV2-5-B 07-21-03_MO-S-06-23-03N73-CSV3-3-vs-72SV3-5-A [1,] 126 154 [2,] 126 150 [3,] 128 155 07-18-03_MO-S-06-23-03N98-CSV1-3-vs-72SV1-5-B [1,] 129 [2,] 127 [3,] 127 >


Download ppt "1-19-061 Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic."

Similar presentations


Ads by Google