1-17-061 Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that.

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

Chapter 7 Hypothesis Testing
Hypothesis testing Another judgment method of sampling data.
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Inferential Statistics
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
AP Statistics – Chapter 9 Test Review
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Goodness-of-Fit Test.
Introduction to Hypothesis Testing
Section 7.1 Hypothesis Testing: Hypothesis: Null Hypothesis (H 0 ): Alternative Hypothesis (H 1 ): a statistical analysis used to decide which of two competing.
Differentially expressed genes
Comparing Means.
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Lecture 12 One-way Analysis of Variance (Chapter 15.2)
8-2 Basics of Hypothesis Testing
8-3 Testing a Claim about a Proportion
BCOR 1020 Business Statistics
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
False Discovery Rate (FDR) = proportion of false positive results out of all positive results (positive result = statistically significant result) Ladislav.
Introduction to Hypothesis Testing
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
Hypothesis Testing:.
Hypothesis Testing Statistics for Microarray Data Analysis – Lecture 3 supplement The Fields Institute for Research in Mathematical Sciences May 25, 2002.
Lecture Slides Elementary Statistics Twelfth Edition
Multiple testing in high- throughput biology Petter Mostad.
Sections 8-1 and 8-2 Review and Preview and Basics of Hypothesis Testing.
Chapter 8 Hypothesis Testing “Could these observations really have occurred by chance?” Shannon Sprott GEOG /3/2010.
Section 9.1 Introduction to Statistical Tests 9.1 / 1 Hypothesis testing is used to make decisions concerning the value of a parameter.
Tests of significance & hypothesis testing Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
QNT 531 Advanced Problems in Statistics and Research Methods
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
Applying False Discovery Rate (FDR) Control in Detecting Future Climate Changes ZongBo Shang SIParCS Program, IMAGe, NCAR August 4, 2009.
Overview Basics of Hypothesis Testing
Essential Statistics in Biology: Getting the Numbers Right
Slide Slide 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim about a Proportion 8-4 Testing a Claim About.
Chapter 8 Introduction to Hypothesis Testing
Hypothesis testing Chapter 9. Introduction to Statistical Tests.
Assume we have two experimental conditions (j=1,2) We measure expression of all genes n times under both experimental conditions (n two- channel.
First approach - repeating a simple analysis for each gene separately - 30k times Assume we have two experimental conditions (j=1,2) We measure.
False Discovery Rates for Discrete Data Joseph F. Heyse Merck Research Laboratories Graybill Conference June 13, 2008.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 8-3 Testing a Claim About a Proportion.
Agresti/Franklin Statistics, 1 of 122 Chapter 8 Statistical inference: Significance Tests About Hypotheses Learn …. To use an inferential method called.
Multiple Testing in Microarray Data Analysis Mi-Ok Kim.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Hypothesis Testing State the hypotheses. Formulate an analysis plan. Analyze sample data. Interpret the results.
1 Chapter 8 Hypothesis Testing 8.2 Basics of Hypothesis Testing 8.3 Testing about a Proportion p 8.4 Testing about a Mean µ (σ known) 8.5 Testing about.
Lecture 18 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Multiple Testing Matthew Kowgier. Multiple Testing In statistics, the multiple comparisons/testing problem occurs when one considers a set of statistical.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.
Statistical Testing with Genes Saurabh Sinha CS 466.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Introduction to Hypothesis Testing
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
1 Section 8.2 Basics of Hypothesis Testing Objective For a population parameter (p, µ, σ) we wish to test whether a predicted value is close to the actual.
Hypothesis Testing Steps for the Rejection Region Method State H 1 and State H 0 State the Test Statistic and its sampling distribution (normal or t) Determine.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
If we fail to reject the null when the null is false what type of error was made? Type II.
1 השוואות מרובות מדדי טעות, עוצמה, רווחי סמך סימולטניים ד"ר מרינה בוגומולוב מבוסס על ההרצאות של פרופ' יואב בנימיני ופרופ' מלכה גורפיין.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
Review and Preview and Basics of Hypothesis Testing
Statistical Testing with Genes
More About Tests Notes from
Statistical Testing with Genes
Presentation transcript:

Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that this particular outcome would occur in a very large (“infinite”) number of replicated experiments Random variable is a mapping assigning real numbers to the set of all possible experimental outcomes - often equivalent to the experimental outcome Probability distribution describes the probability of any outcome, or any particular value of the corresponding random variable in an experiment If we have two different experiments, the probability of any combination of outcomes is the joint probability and the joint probability distribution describes probabilities of observing and combination of outcomes If the outcome of one experiment does not affect the probability distribution of the other, we say that outcomes are independent Event is a set of one or more possible outcomes

Back to basics – Probability, Conditional Probability and Independence Let N be the very large number of trials of an experiment, and n i be the number of times that i th outcome (o i ) out of possible infinitely many possible outcomes has been observed p i =n i /N is the probability of the i th outcome Properties of probabilities following from this definition 1) p i  0 2) p i  1 4) For any set of mutually exclusive events (events that don't have any outcomes in common) 5) p(NOT e) = 1-p(e) for any event e

Conditional Probabilities and Independence Suppose you have a set of N DNA sequences. Let the random variable X denote the identity of the first nucleotide and the random variable Y the identity of the second nucleotide. Suppose now that you have randomly selected a DNA sequence from this set and looked at the first nucleotide but not the second. Question: what is the probability of a particular second nucleotide y given that you know that the first nucleotide is x * ? The probability of a randomly selected DNA sequence from this set to have the xy dinucleotide at the beginning is equal to P(X=x,Y=y) P(Y=y|X=x*) is the conditional probability of Y=y given that X=x* X and Y are independent if P(Y=y|X=x)=P(Y=y)

Conditional Probabilities Another Example Measuring differences between expression levels under two different experimental condition for two genes (1 and 2) in many replicated experiments Outcomes of each experiment are X=1 if the difference for gene 1 is greater than 2 and 0 otherwise Y=1 if the difference for gene 2 is greater than 2 and 0 otherwise Suppose now that in one experiment we look at gene 1 and know that X=0 Question: What is the probability of Y=1 knowing that X=0 The joint probability of differences for both genes being greater than 2 in any single experiment is P(X=1,Y=1) P(Y=1|X=0) is the conditional probability of Y=1 given that X=0 X and Y are independent if P(Y=y|X=x)=P(Y=y) for any x and y

Conditional Probabilities and Independence If X and Y are independent, then from Probability of two independent events is equal to the product of their probabilities

Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic and the corresponding p-value for the i th gene, i=1,...,T P-value is the probability of observing as extreme or more extreme value of the t-statistic under the “null-distribution” (i.e. the distributions assuming that  i Ctl =  i Nic ) than the one calculated from the data (t * ) The i th gene is "differentially expressed" if we can reject the i th null hypothesis  i Ctl =  i Nic and conclude that  i Ctl   i Nic at a significance level  (i.e. if p i <  ) Type I error is committed when a null-hypothesis is falsely rejected Type II error is committed when a null-hypothesis is not rejected but it is false Experiment-wise Type I Error is committed if any of a set of (T) null hypothesis is falsely rejected If the significance level is chosen prior to conducting experiment, we know that by following the hypothesis testing procedure, we will have the probability of falsely concluding that any one gene is differentially expressed (i.e. falsely reject the null hypothesis) is equal to  What is the probability of committing a Family-wise Type I Error? Assuming that all null hypothesis are true, what is the probability that we would reject at least one of them? Identifying Differentially Expressed Genes

Experiment-wise error rate Assuming that individual tests of hypothesis are independent and true: p(Not Committing The Experiment-Wise Error) = p( Not Rejecting H 0 1 AND Not Rejecting H 0 2 AND... AND Not Rejecting H 0 T ) = (1-  )(1-  )...(1-  ) = (1-  ) T p(Committing The Experiment-Wise Error) =1- (1-  ) T

Experiment-wise error rate If we want to keep the FWER at  level: Sidak’s adjustment:  a = 1-(1-  ) 1/T FWER=1- (1-  a ) T = 1- (1-[ 1-(1-  ) 1/T ]) T = 1-((1-  ) 1/T ) T = 1-(1-  ) =  For FWER=0.05  a =

Experiment-wise error rate Another adjustment: p(Committing The Experiment-Wise Error) = ( Rejecting H 0 1 OR Rejecting H 0 2 OR... OR Rejecting H 0 T )  T  (Homework: How does that follow from the probability properties) Bonferroni adjustment:  b =  /T Generally  b <  a  Bonferroni adjustment more conservative The Sidak's adjustment assumes independence – likely not to be satisfied. If tests are not independent, Sidak's adjustment is most likely conservative but it could be liberal

Adjusting p-value Individual Hypotheses: H 0 i :  i W =  i C  p i =p(t n-1 > t i * ), i=1,...,T "Composite" Hypothesis: H 0 : {  i W =  i C, i=1,...,T}  p=min{p i, i=1,...,T} The composite null hypothesis is rejected if even a single individual hypothesis is rejected Consequently the p-value for the composite hypothesis is equal to the minimum of individual p-values If all tests have the same reference distribution, this is equivalent to p=p(t n-1 > t * max ) We can consider a p-value to be itself the outcome of the experiment What is the "null" probability distribution of the p-value for individual tests of hypothesis? What is the "null" probability distribution for the composite p-value?

Given that the null hypothesis is true, probability of observing the p-values smaller than a fixed number between 0 and 1 is: p(p i t a )=a Null distribution of the p-value tata -t a The null distribution of t * The null distribution of p i a

p(p < a) = p(min{p i, i=1,...,T} < a) = = 1- p(min{p i, i=1,...,T} > a) = = 1-p(p 1 > a AND p 2 > a AND... AND p T > a) = =Assuming independence between different tests = =1- [p(p 1 > a) p(p 2 > a)... p(p T > a)] = =1-[1-p(p 1 < a)] [1-p(p 2 < a)]... [1-p(p T < a)]= =1-[1-a] T Instead of adjusting the significance level, can adjust all p-values: p i a = 1-[1-a] T Null distribution of the composite p-value

Null distribution of the composite p-value The null distribution of the composite p-value for 1, 10 and tests

Seems simple Applying a conservative p-value adjustment will take care of false positives How about false negatives Type II Error arises when we fail to reject H 0 although it is false Power=p(Rejecting H 0 when  W -  C  0) = p(t * > t  |  W -  C  0)=p(p<  |  W -  C  0) Depends on various things ( , df, ,  W -  C ) Probability distribution of is non-central t

Effects multiple comparison adjustments on power t 4 : Green Dashed Line t 9 : Red Dashed Line t 4,nc=6.1 : Green Solid Line t 9,nc=8.6 Red Solid Line T=5000,  =0.05,  a =0.0001,  W -  C = 10,  =

This is not good enough Traditional statistical approaches to multiple comparison adjustments which strictly control the experiment-wise error rates are not optimal Need a balance between the false positive and false negative rates Benjamini Y and Hochberg Y (1995) Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society B 57: Instead of controlling the probability of generating a single false positive, we control the proportion of false positives Consequence is that some of the implicated genes are likely to be false positives.

False Discovery Rate FDR = E(V/R) If all null hypothesis are true (composite null) this is equivalent to the Family-wise error rate

False Discovery Rate Alternatively, adjust p-values as

Effects > FDRpvalue<-p.adjust(TPvalue,method="fdr") > BONFpvalue<-p.adjust(TPvalue,method="bonferroni")