SimpleData<-read.table(file=" + header=TRUE,quote="",sep="\t",comment.char="") > SimpleData[1,] Name ID W1 C1 W2 C2 W3 C3 C4 W4 C5 W5 C6 W6 1 no name Rn > W<-c(3,5,7,9,11,13) > C<-c(4,6,8,10,12,14)"> SimpleData<-read.table(file=" + header=TRUE,quote="",sep="\t",comment.char="") > SimpleData[1,] Name ID W1 C1 W2 C2 W3 C3 C4 W4 C5 W5 C6 W6 1 no name Rn > W<-c(3,5,7,9,11,13) > C<-c(4,6,8,10,12,14)">

Presentation is loading. Please wait.

Presentation is loading. Please wait.

1-11-20051 For a specific gene x ij = i th measurement under condition j, i=1,…,6; j=1,2 Is a Specific Gene Differentially Expressed Differential expression.

Similar presentations


Presentation on theme: "1-11-20051 For a specific gene x ij = i th measurement under condition j, i=1,…,6; j=1,2 Is a Specific Gene Differentially Expressed Differential expression."— Presentation transcript:

1 1-11-20051 For a specific gene x ij = i th measurement under condition j, i=1,…,6; j=1,2 Is a Specific Gene Differentially Expressed Differential expression   1   2 Statistical Model of observed data Estimate the model parameters based on the data Calculating t-statistic t*t* -t * Calculating p-value based on the “null distribution” of the t-statistic assuming  1 =  2

2 1-11-20052 How do we perform t-test for 30,000 at once How do we handle results, present data and results What is significant How to compare different approaches to normalization of the data and the statistical analysis of results Ideally, we would like to maximize our ability to identify truly differentially expressed genes and minimize the falsely implicated genes. Doing it by hand (by R) first Using Bioconductor Genome-wide analysis

3 1-11-20053 Calculating t-test for 30,000 genes at a time Data import : source("http://eh3.uc.edu/ImportSimpleData.R") >SimpleData<-read.table(file="http://eh3.uc.edu/SimpleData.txt", + header=TRUE,quote="",sep="\t",comment.char="") > SimpleData[1,] Name ID W1 C1 W2 C2 W3 C3 C4 W4 C5 W5 C6 W6 1 no name Rn30000100 85 57 91 71 67 111 72 86 88 108 124 171 > W<-c(3,5,7,9,11,13) > C<-c(4,6,8,10,12,14)

4 1-11-20054 Calculating t-test for 30,000 genes at a time Transforming data : source("http://eh3.uc.edu/TransformSimpleData.R") > NoZerosData<-SimpleData[,3:14] > NoZerosData[33525,] W1 C1 W2 C2 W3 C3 C4 W4 C5 W5 C6 W6 33525 94 51 75 56 53 0 79 84 87 73 86 0 > NoZerosData[NoZerosData==0]<-NA > NoZerosData[33525,] W1 C1 W2 C2 W3 C3 C4 W4 C5 W5 C6 W6 33525 94 51 75 56 53 NA 79 84 87 73 86 NA log(0) = -Inf log(-1)=-Inf function(-Inf) = -Inf or Inf or NaN rm.na=TRUE > LSimpleData<-SimpleData > LSimpleData[,3:14]<-log(NoZerosData,base=2)

5 1-11-20055 Calculating t-test for 30,000 genes at a time Calculating t-tests : source("http://eh3.uc.edu/MultipleTTest.R") MW<-apply(t(LSimpleData[,W]),2,mean,na.rm=TRUE) VW<-apply(t(LSimpleData[,W]),2,var,na.rm=TRUE) MC<-apply(t(LSimpleData[,C]),2,mean,na.rm=TRUE) VC<-apply(t(LSimpleData[,C]),2,var,na.rm=TRUE) NW<-apply(t(!is.na(LSimpleData[,W])),2,sum,na.rm=TRUE) NC<-apply(t(!is.na(LSimpleData[,C])),2,sum,na.rm=TRUE) VWC<-(((NW-1)*VW)+((NC-1)*VC))/(NC+NW-2) DF<-NW+NC-2 TStat<-abs(MW-MC)/((VWC*((1/NW)+(1/NC)))^0.5) TPvalue<-2*pt(TStat,DF,lower.tail=FALSE)

6 1-11-20056 source("http://eh3.uc.edu/TTestScatterPlots.R")http://eh3.uc.edu/TTestScatterPlots.R Displaying results – Scatter Plots

7 1-11-20057 source("http://eh3.uc.edu/TTestHistograms.R")http://eh3.uc.edu/TTestHistograms.R Displaying results - Histograms

8 1-11-20058 Expression Data on Individual Microarrays source("http://eh3.uc.edu/MicroarrayScatterPlots.R")http://eh3.uc.edu/MicroarrayScatterPlots.R

9 1-11-20059 Normalization is the process of removing systematic biases prior to statistical analysis Systematic intensity-dependent trends are considered a systematic bias since it is extremely unlikely that they are a consequence of some underlying biological mechanism of interest This particular bias is effectively removed by estimating the intensity-dependent "trend" using the local regression and subtracting it from the observed ratios We will generally consider that normalization procedures do not affect independence of experimental replicates – they are performed separately for each microarray Some biases cannot be factored out without introducing certain level of correlation between replicate. Such biases will be factored out within the statistical model that will then account for introducing such correlation (through multi-way Analysis of Variance Model) Microarray-Specific Normalization of Expression Data

10 1-11-200510 Local Regression Normalization source("http://eh3.uc.edu/LoessNormalization.R")http://eh3.uc.edu/LoessNormalization.R

11 1-11-200511 Normalized Data source("http://eh3.uc.edu/NormalizedDataScatterPlots.R")http://eh3.uc.edu/NormalizedDataScatterPlots.R

12 1-11-200512 source(http://eh3.uc.edu/NormalizedTTests.R)http://eh3.uc.edu/NormalizedTTests.R Normalized Data Displaying results – Scatter Plots

13 1-11-200513 source("http://eh3.uc.edu/ComparingTTests.R")http://eh3.uc.edu/ComparingTTests.R Comparing Normalized and Raw Data Results Median 75 th Percentile 25 th Percentile 1.5xIQR

14 1-11-200514 source("http://eh3.uc.edu/ComparingTTests.R")http://eh3.uc.edu/ComparingTTests.R Comparing Normalized and Raw Data Results


Download ppt "1-11-20051 For a specific gene x ij = i th measurement under condition j, i=1,…,6; j=1,2 Is a Specific Gene Differentially Expressed Differential expression."

Similar presentations


Ads by Google