Hypothesis testing and statistical decision theory

Slides:



Advertisements
Similar presentations
Oct 9, 2014 Lirong Xia Hypothesis testing and statistical decision theory.
Advertisements

Nov 7, 2013 Lirong Xia Hypothesis testing and statistical decision theory.
Lecture XXIII.  In general there are two kinds of hypotheses: one concerns the form of the probability distribution (i.e. is the random variable normally.
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
Likelihood ratio tests
Hypothesis Testing: Type II Error and Power.
Simulation Modeling and Analysis Session 12 Comparing Alternative System Designs.
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
Chapter 9 Hypothesis Testing.
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 6 Chicago School of Professional Psychology.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Oct 6, 2014 Lirong Xia Judgment aggregation acknowledgment: Ulle Endriss, University of Amsterdam.
Chapter Nine: Evaluating Results from Samples Review of concepts of testing a null hypothesis. Test statistic and its null distribution Type I and Type.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Hypothesis Testing – Introduction
Confidence Intervals and Hypothesis Testing - II
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Hypothesis Testing State the hypotheses. Formulate an analysis plan. Analyze sample data. Interpret the results.
Introduction to Inferece BPS chapter 14 © 2010 W.H. Freeman and Company.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
1 Chapter 8 Introduction to Hypothesis Testing. 2 Name of the game… Hypothesis testing Statistical method that uses sample data to evaluate a hypothesis.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Descriptive and Inferential Statistics Descriptive statistics The science of describing distributions of samples or populations Inferential statistics.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Hypothesis Testing Steps for the Rejection Region Method State H 1 and State H 0 State the Test Statistic and its sampling distribution (normal or t) Determine.
Module 10 Hypothesis Tests for One Population Mean
Hypothesis Testing: One-Sample Inference
More on Inference.
CHAPTER 9 Testing a Claim
Chapter 5 STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESES TESTING
8-1 of 23.
Chapters 20, 21 Hypothesis Testing-- Determining if a Result is Different from Expected.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Statistical inference: distribution, hypothesis testing
Hypothesis Testing Is It Significant?.
Hypothesis Testing – Introduction
Hypothesis Testing: Hypotheses
Hypothesis Testing Summer 2017 Summer Institutes.
When we free ourselves of desire,
CONCEPTS OF HYPOTHESIS TESTING
Introduction to Inference
Chapter 9 Hypothesis Testing.
More on Inference.
Chapter 9 Hypothesis Testing.
Introduction to Inference
Discrete Event Simulation - 4
Chapter 9: Hypothesis Tests Based on a Single Sample
Hypothesis Testing.
I. Statistical Tests: Why do we use them? What do they involve?
Review of Statistical Inference
Hypothesis Testing.
Interval Estimation and Hypothesis Testing
Hypothesis Testing.
Power Section 9.7.
Reasoning in Psychology Using Statistics
Confidence Intervals.
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Chapter 9 Hypothesis Testing: Single Population
CHAPTER 9 Testing a Claim
Section 11.1: Significance Tests: Basics
Presentation transcript:

Hypothesis testing and statistical decision theory Lirong Xia March 25, 2016

Schedule Hypothesis testing Statistical decision theory a more general framework for statistical inference try to explain the scene behind tests Two applications of the minimax theorem Yao’s minimax principle Finding a minimax rule in statistical decision theory

An example The average GRE quantitative score of RPI graduate students vs. national average: 558(139) Randomly sample some GRE Q scores of RPI graduate students and make a decision based on these

Simplified problem: one sample location test You have a random variable X you know the shape of X: normal the standard deviation of X: 1 you don’t know the mean of X

The null and alternative hypothesis Given a statistical model parameter space: Θ sample space: S Pr(s|θ) H1: the alternative hypothesis H1 ⊆ Θ the set of parameters you think contain the ground truth H0: the null hypothesis H0 ⊆ Θ H0∩H1=∅ the set of parameters you want to test (and ideally reject) Output of the test reject the null: suppose the ground truth is in H0, it is unlikely that we see what we observe in the data retain the null: we don’t have enough evidence to reject the null

One sample location test Combination 1 (one-sided, right tail) H1: mean>0 H0: mean=0 (why not mean<0?) Combination 2 (one-sided, left tail) H1: mean<0 H0: mean=0 Combination 3 (two-sided) H1: mean≠0 A hypothesis test is a mapping f : S⟶{reject, retain}

One-sided Z-test xα H1: mean>0 H0: mean=0 Parameterized by a number 0<α<1 is called the level of significance Let xα be such that Pr(X>xα|H0)=α xα is called the critical value Output reject, if x>xα, or Pr(X>x|H0)<α Pr(X>x|H0) is called the p-value Output retain, if x≤xα, or p-value≥α α xα

Interpreting level of significance α xα Popular values of α: 5%: xα= 1.645 std (somewhat confident) 1%: xα= 2.33 std (very confident) α is the probability that given mean=0, a randomly generated data will leads to “reject” Type I error

Two-sided Z-test H1: mean≠0 H0: mean=0 Parameterized by a number 0<α<1 Let xα be such that 2Pr(X>xα|H0)=α Output reject, if x>xα, or x<xα Output retain, if -xα≤x≤xα α -xα xα

Evaluation of hypothesis tests What is a “correct” answer given by a test? when the ground truth is in H0, retain the null (≈saying that the ground truth is in H0) when the ground truth is in H1, reject the null (≈saying that the ground truth is in H1) only consider cases where θ∈H0∪H1 Two types of errors Type I: wrongly reject H0, false alarm Type II: wrongly retain H0, fail to raise the alarm Which is more serious?

Type I and Type II errors Output Retain Reject Ground truth in H0 size: 1-α Type I: α H1 Type II: β power: 1-β Type I: the max error rate for all θ∈H0 α=supθ∈H0Pr(false alarm|θ) Type II: the error rate given θ∈H1 Is it possible to design a test where α=β=0? usually impossible, needs a tradeoff

Illustration One-sided Z-test we can freely control Type I error Black: One-sided Z-test Type II: β One-sided Z-test we can freely control Type I error for Type II, fix some θ∈H1 Another test Type I: α Output Retain Reject Ground truth in H0 size: 1-α Type I: α H1 Type II: β power: 1-β β:Type II error α:Type I error xα θ

Using two-sided Z-test for one-sided hypothesis Errors for one-sided Z-test Errors for two-sided Z-test, same α α:Type I error Type II error θ Type II error α:Type I error

Using one-sided Z-test for a set-valued null hypothesis H0: mean≤0 (vs. mean=0) H1: mean>0 supθ≤0Pr(false alarm|θ)=Pr(false alarm|θ=0) Type I error is the same Type II error is also the same for any θ>0 Any better tests?

Optimal hypothesis tests A hypothesis test f is uniformly most powerful (UMP), if for any other test f’ with the same Type I error for any θ∈H1, Type II error of f < Type II error of f’ Type II: β Black: UMP Any other test Type I: α Corollary of Karlin-Rubin theorem: One-sided Z-test is a UMP for H0:≤0 and H1:>0 generally no UMP for two-sided tests

Template of other tests Tell you the H0 and H1 used in the test e.g., H0:mean≤0 and H1:mean>0 Tell you the test statistic, which is a function from data to a scalar e.g., compute the mean of the data For any given α, specify a region of test statistic that will leads to the rejection of H0 e.g.,

How to do test for your problem? Step 1: look for a type of test that fits your problem (from e.g. wiki) Step 2: choose H0 and H1 Step 3: choose level of significance α Step 4: run the test

Statistical decision theory Given statistical model: Θ, S, Pr(s|θ) decision space: D loss function: L(θ, d)∈ℝ We want to make a decision based on observed generated data decision function f : data⟶D

Hypothesis testing as a decision problem D={reject, retain} L(θ, reject)= 0, if θ∈H1 1, if θ∈H0 (type I error) L(θ, retain)= 0, if θ∈H0 1, if θ∈H1 (type II error)

Bayesian expected loss Given data and the decision d ELB(data, d) = Eθ|dataL(θ,d) Compute a decision that minimized EL for a given the data

Frequentist expected loss Given the ground truth θ and the decision function f ELF(θ, f ) = Edata|θL(θ,f(data)) Compute a decision function with small EL for all possible ground truth c.f. uniformly most powerful test: for all θ∈H1, the UMP test always has the lowest expected loss (Type II error) A minimax decision rule f is argminf maxθ ELF(θ, f ) most robust against unknown parameter

Two interesting applications of game theory

The Minimax theorem For any simultaneous-move two player zero-sum game The value of a player’s mixed strategy s is her worst-case utility against against the other player Value(s)=mins’ U(s,s’) s1 is a mixed strategy for player 1 with maximum value s2 is a mixed strategy for player 2 with maximum value Theorem Value(s1)=-Value(s2) [von Neumann] (s1, s2) is an NE for any s1’ and s2’, Value(s1’) ≤ Value(s1)= -Value(s2) ≤ - Value(s2’) to prove that s1* is minimax, it suffices to find s2* with Value(s1*)=-Value(s2*)

App1: Yao’s minimax principle Question: how to prove a randomized algorithm A is (asymptotically) fastest? Step 1: analyze the running time of A Step 2: show that any other randomized algorithm runs slower for some input but how to choose such a worst-case input for all other algorithms? Theorem [Yao 77] For any randomized algorithm A the worst-case expected running time of A is more than for any distribution over all inputs, the expected running time of the fastest deterministic algorithm against this distribution Example. You designed a O(n2) randomized algorithm, to prove that no other randomized algorithm is faster, you can find a distribution π over all inputs (of size n) show that the expected running time of any deterministic algorithm on π is more than O(n2)

Proof Two players: you, Nature Pure strategies Payoff You: deterministic algorithms Nature: inputs Payoff You: negative expected running time Nature: expected running time For any randomized algorithm A largest expected running time on some input is more than the expected running time of your best (mixed) strategy =the expected running time of Nature’s best (mixed) strategy is more than the smallest expected running time of any deterministic algorithm on any distribution over inputs

App2: finding a minimax rule? Guess a least favorable distribution π over the parameters let fπ denote its Bayesian decision rule Proposition. fπ minimizes the expected loss among all rules, i.e. fπ=argminf Eθ∽πELF(θ, f ) Theorem. If for all θ, ELF(θ, fπ) are the same, then fπ is minimax

Proof Two players: you, Nature Pure strategies Payoff You: deterministic decision rules Nature: the parameter Payoff You: negative frequentist loss, want to minimize the max frequentist loss Nature: frequentist loss ELF(θ, f ) = Edata|θL(θ,f(data)), want to maximize the minimum frequentist loss Nee to prove that fπ is minimax suffices to show that there exists a mixed strategy π* for Nature π* is a distribution over Θ such that for all rule f and all parameter θ, ELF( π*, f ) ≥ ELF(θ, fπ ) the equation holds for π*=π QED

Recap Problem: make a decision based on randomly generated data Z-test null/alternative hypothesis level of significance reject/retain Statistical decision theory framework Bayesian expected loss Frequentist expected loss Two applications of the minimax theorem