1-11-20051 For a specific gene x ij = i th measurement under condition j, i=1,…,6; j=1,2 Is a Specific Gene Differentially Expressed Differential expression.

Slides:



Advertisements
Similar presentations
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.
Multiple Linear Regression Model
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Microarray Data Preprocessing and Clustering Analysis
10 Hypothesis Testing. 10 Hypothesis Testing Statistical hypothesis testing The expression level of a gene in a given condition is measured several.
Differentially expressed genes
Statistical Analysis of Microarray Data
GCB/CIS 535 Microarray Topics John Tobias November 8th, 2004.
1 Test of significance for small samples Javier Cabrera.
Topic 3: Regression.
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
Chapter 2 Simple Comparative Experiments
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Statistical Treatment of Data Significant Figures : number of digits know with certainty + the first in doubt. Rounding off: use the same number of significant.
5-3 Inference on the Means of Two Populations, Variances Unknown
Choosing Statistical Procedures
Chapter 9 Two-Sample Tests Part II: Introduction to Hypothesis Testing Renee R. Ha, Ph.D. James C. Ha, Ph.D Integrative Statistics for the Social & Behavioral.
Correlation and Linear Regression
Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
Things that I think are important Chapter 1 Bar graphs, histograms Outliers Mean, median, mode, quartiles of data Variance and standard deviation of.
What does it mean? The variance of the error term is not constant
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Essential Statistics in Biology: Getting the Numbers Right
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable.
Data Handling & Analysis BD7054 Scatter Plots Andrew Jackson
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Suppose we have analyzed total of N genes, n of which turned out to be differentially expressed/co-expressed (experimentally identified - call them significant)
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
Assume we have two experimental conditions (j=1,2) We measure expression of all genes n times under both experimental conditions (n two- channel.
First approach - repeating a simple analysis for each gene separately - 30k times Assume we have two experimental conditions (j=1,2) We measure.
We calculated a t-test for 30,000 genes at once How do we handle results, present data and results Normalization of the data as a mean of removing.
AP Stat Review Descriptive Statistics Grab Bag Probability
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Statistical analysis of expression data: Normalization, differential expression and multiple testing Jelle Goeman.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Simple Linear Regression ANOVA for regression (10.2)
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Statistics for Differential Expression Naomi Altman Oct. 06.
Chapter Twelve The Two-Sample t-Test. Copyright © Houghton Mifflin Company. All rights reserved.Chapter is the mean of the first sample is the.
Data Analysis.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Statistical Analysis of Microarray Data By H. Bjørn Nielsen.
Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Analyzing Expression Data: Clustering and Stats Chapter 16.
PCB 3043L - General Ecology Data Analysis.
- We have samples for each of two conditions. We provide an answer for “Are the two sample means significantly different from each other, or could both.
1 Estimation of Gene-Specific Variance 2/17/2011 Copyright © 2011 Dan Nettleton.
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Maths Study Centre CB Open 11am – 5pm Semester Weekdays
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Homogeneity of Variance Pooling the variances doesn’t make sense when we cannot assume all of the sample Variances are estimating the same value. For two.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
GS/PPAL Section N Research Methods and Information Systems
Presentation transcript:

For a specific gene x ij = i th measurement under condition j, i=1,…,6; j=1,2 Is a Specific Gene Differentially Expressed Differential expression   1   2 Statistical Model of observed data Estimate the model parameters based on the data Calculating t-statistic t*t* -t * Calculating p-value based on the “null distribution” of the t-statistic assuming  1 =  2

How do we perform t-test for 30,000 at once How do we handle results, present data and results What is significant How to compare different approaches to normalization of the data and the statistical analysis of results Ideally, we would like to maximize our ability to identify truly differentially expressed genes and minimize the falsely implicated genes. Doing it by hand (by R) first Using Bioconductor Genome-wide analysis

Calculating t-test for 30,000 genes at a time Data import : source(" >SimpleData<-read.table(file=" + header=TRUE,quote="",sep="\t",comment.char="") > SimpleData[1,] Name ID W1 C1 W2 C2 W3 C3 C4 W4 C5 W5 C6 W6 1 no name Rn > W<-c(3,5,7,9,11,13) > C<-c(4,6,8,10,12,14)

Calculating t-test for 30,000 genes at a time Transforming data : source(" > NoZerosData<-SimpleData[,3:14] > NoZerosData[33525,] W1 C1 W2 C2 W3 C3 C4 W4 C5 W5 C6 W > NoZerosData[NoZerosData==0]<-NA > NoZerosData[33525,] W1 C1 W2 C2 W3 C3 C4 W4 C5 W5 C6 W NA NA log(0) = -Inf log(-1)=-Inf function(-Inf) = -Inf or Inf or NaN rm.na=TRUE > LSimpleData<-SimpleData > LSimpleData[,3:14]<-log(NoZerosData,base=2)

Calculating t-test for 30,000 genes at a time Calculating t-tests : source(" MW<-apply(t(LSimpleData[,W]),2,mean,na.rm=TRUE) VW<-apply(t(LSimpleData[,W]),2,var,na.rm=TRUE) MC<-apply(t(LSimpleData[,C]),2,mean,na.rm=TRUE) VC<-apply(t(LSimpleData[,C]),2,var,na.rm=TRUE) NW<-apply(t(!is.na(LSimpleData[,W])),2,sum,na.rm=TRUE) NC<-apply(t(!is.na(LSimpleData[,C])),2,sum,na.rm=TRUE) VWC<-(((NW-1)*VW)+((NC-1)*VC))/(NC+NW-2) DF<-NW+NC-2 TStat<-abs(MW-MC)/((VWC*((1/NW)+(1/NC)))^0.5) TPvalue<-2*pt(TStat,DF,lower.tail=FALSE)

source(" Displaying results – Scatter Plots

source(" Displaying results - Histograms

Expression Data on Individual Microarrays source("

Normalization is the process of removing systematic biases prior to statistical analysis Systematic intensity-dependent trends are considered a systematic bias since it is extremely unlikely that they are a consequence of some underlying biological mechanism of interest This particular bias is effectively removed by estimating the intensity-dependent "trend" using the local regression and subtracting it from the observed ratios We will generally consider that normalization procedures do not affect independence of experimental replicates – they are performed separately for each microarray Some biases cannot be factored out without introducing certain level of correlation between replicate. Such biases will be factored out within the statistical model that will then account for introducing such correlation (through multi-way Analysis of Variance Model) Microarray-Specific Normalization of Expression Data

Local Regression Normalization source("

Normalized Data source("

source( Normalized Data Displaying results – Scatter Plots

source(" Comparing Normalized and Raw Data Results Median 75 th Percentile 25 th Percentile 1.5xIQR

source(" Comparing Normalized and Raw Data Results