From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = 113.5 bp median(nucs) = 110 bp sd(nucs+ = 17.3.

Slides:



Advertisements
Similar presentations
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Advertisements

Confidence Interval and Hypothesis Testing for:
Business 205. Review Sampling Continuous Random Variables Central Limit Theorem Z-test.
Elementary hypothesis testing
10 Hypothesis Testing. 10 Hypothesis Testing Statistical hypothesis testing The expression level of a gene in a given condition is measured several.
Differentially expressed genes
ANOVA Determining Which Means Differ in Single Factor Models Determining Which Means Differ in Single Factor Models.
Statistical Analysis of Microarray Data
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Hypothesis testing & Inferential Statistics
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Hypothesis Tests for Means The context “Statistical significance” Hypothesis tests and confidence intervals The steps Hypothesis Test statistic Distribution.
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Chapter 2 Simple Comparative Experiments
Inferences About Process Quality
Today Concepts underlying inferential statistics
5-3 Inference on the Means of Two Populations, Variances Unknown
Richard M. Jacobs, OSA, Ph.D.
Inferential Statistics
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Power and Sample Size IF IF the null hypothesis H 0 : μ = μ 0 is true, then we should expect a random sample mean to lie in its “acceptance region” with.
AM Recitation 2/10/11.
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
Hypothesis Tests In statistics a hypothesis is a statement that something is true. Selecting the population parameter being tested (mean, proportion, variance,
Hypothesis testing. Want to know something about a population Take a sample from that population Measure the sample What would you expect the sample to.
1/2555 สมศักดิ์ ศิวดำรงพงศ์
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Week 8 Fundamentals of Hypothesis Testing: One-Sample Tests
Essential Statistics in Biology: Getting the Numbers Right
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
One-Way Analysis of Variance Comparing means of more than 2 independent samples 1.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
1 Statistical Inference. 2 The larger the sample size (n) the more confident you can be that your sample mean is a good representation of the population.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Differential Expression II Adding power by modeling all the genes Oct 06.
B AD 6243: Applied Univariate Statistics Hypothesis Testing and the T-test Professor Laku Chidambaram Price College of Business University of Oklahoma.
Hypothesis Tests In statistics a hypothesis is a statement that something is true. Selecting the population parameter being tested (mean, proportion, variance,
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Confidence intervals and hypothesis testing Petter Mostad
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Chapter 20 Testing Hypothesis about proportions
Multiple Testing Matthew Kowgier. Multiple Testing In statistics, the multiple comparisons/testing problem occurs when one considers a set of statistical.
Analysis of Variance (ANOVA) Brian Healy, PhD BIO203.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Statistical Analysis of Microarray Data By H. Bjørn Nielsen.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Methods and Applications CHAPTER 15 ANOVA : Testing for Differences among Many Samples, and Much.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
© Copyright McGraw-Hill 2004
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
ENGR 610 Applied Statistics Fall Week 7 Marshall University CITE Jack Smith.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Lecture 8 Estimation and Hypothesis Testing for Two Population Parameters.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Hypothesis Testing and Statistical Significance
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Inferential Statistics Psych 231: Research Methods in Psychology.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
Estimation & Hypothesis Testing for Two Population Parameters
Presentation transcript:

From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3 bp

Link to Jean-Yves Sgro’s R & Bioconductor Manual

R 2 = R 2 = R 2 = Why did the correlation go up?

4 Expression difference Gene X expression under condition 1 Gene X expression under condition 2 Select differentially expressed genes to focus on Methods of gene selection: -- arbitrary fold-expression-change cutoff example: genes that change >3X in expression between samples -- statistically significant change in expression requires replicates

5 Test if the means of 2 (or more) groups are the same or statistically different The ‘null hypothesis’ H 0 says that the two groups are statistically the same -- you will either accept or reject the null hypothesis Choosing the right test: parametric test if your data are normally distributed with equal variance nonparametric test if neither of the above are true Why do the data need to be normally distributed?

6 Test if the means of 2 groups are the same or statistically different The ‘null hypothesis’ H 0 says that the two groups are statistically the same -- you will either accept or reject the null hypothesis T = X 1 – X 2 difference in the means standard error of the difference in the means SED If your two samples are normally distributed with equal variance, use the t-test If T > T c where T c is the critical value for the degrees of freedom & confidence level, then reject H 0 Notice that if the data aren’t normally distributed mean and standard deviation are not meaningful.

7 T = D Average difference in expression Standard error of the mean difference SEM If your two samples are normally distributed with equal variance AND your data were paired before collection, use the paired t-test The paired t-test for gene expression ratios Example: Tumor sample before and after treatment Gene expression differences expressed as ratios eg) mutant vs. wt log 2 [ratio]: If T > T c where T c is the critical value for the degrees of freedom (n-1) & confidence level, then reject H 0

8 Test if the means of 2 (or more) groups are the same or statistically different The ‘null hypothesis’ H 0 says that the two groups are statistically the same -- you will either accept or reject the null hypothesis ANOVA (ANalysis Of Variance): for comparing 2 or more means variation between samples variation within samples F = ANOVA only tells you that at least one of your samples is different … may need to identify which is different for >2 sample comparisons If F > F c where F c is the critical value for the degrees of freedom (n-1) & confidence level, then reject H 0

9 Assessing & minimizing error in calls Type I error = false positives FDR = False Discovery Rate Type II error = false negatives Balance between minimizing false positives vs. false negatives Assessing false positives vs. false negatives: sensitivity vs. specificity Sensitivity (how well did you find what you want): # of true positives # of total significant calls ( = #true positives + # false negatives) Specificity (how well did you discriminate): # of true negatives # of total negative calls (= #true negatives + #false positives)

Assessing accuracy based on known truth: Receiver-Operator Curves (ROC) Plot the fraction of True Positives (TPR) vs False Positives (FPR) called at each significance threshold Known Truth is either a set of positive controls … or can come from simulated data

11 When working with many genes must correct for multiple testing … p < 0.01 means that there is a 1 in 100 chance that the observation is H 0 But if you have 30,000 genes, with 0.01 change that each conclusion is wrong then you will get 300 false positives! Adjust the p-value cutoff such that there is a 1 in 100 chance of false identification for each gene: p = 0.01 / 30,000 ‘trials’ p < 3 x is significant (this is also known as Bonferroni correction)

Newer, better way of dealing with this is FDR correction FDR: false discovery rate How many of the called positives are false? 5% FDR means 5% of calls are false positive This is different from the false positive rate: The rate at which true negatives are called significant 5% false positives means 5% of true negatives are incorrectly called significant “The p-value cutoff [and false positive rate] says little about the content of the features actually called significant” (Storey and Tibshirani 2003) Storey and Tibshirani 2003: q-value to represent FDR

FDR = expected ratio of false positives vs all positives (Expected [F/S]) q value: for a given region of data space, what fraction of genes in that region are false? eg) Gene X has a q = 0.04 … this means that if you call Gene X significant, then all the genes with better statistics must also be called significant -> 4% of all of these genes are false positives “The q-value for a particular feature is the expected proportion of false positives incurred when calling that feature significant.”

FDR = expected ratio of false positives vs all positives: Expected [F/S] ~ Expected[F] / Expected [S] -- can initially estimate S based on a simple p-value cutoff We need to estimate  0 = m 0 / m = fraction of all features that are truly negative Genes with p > 0.5 show a relatively flat density … because we expect that p-values of null genes are randomly distributed, we assume that most of these genes are true nulls … (The tuning parameter is the p cutoff above which nulls are assumed) The density for genes with p>0.5 allows us to estimate the # of true negatives and thus  0  in this case