Presentation on theme: "Professor William Greene Stern School of Business IOMS Department Department of Economics Statistical Inference and Regression Analysis: Stat-GB.3302.30,"— Presentation transcript:
Professor William Greene Stern School of Business IOMS Department Department of Economics Statistical Inference and Regression Analysis: Stat-GB , Stat-UB
Part 5 – Hypothesis Testing
3/100 Part 5 – Hypothesis Testing Objectives of Statistical Analysis Estimation How long do hard drives last? What is the median income among the 99%ers? Inference – hypothesis testing Did minorities pay higher mortgage rates during the housing boom? Is there a link between environmental factors and breast cancer on eastern long island?
4/100 Part 5 – Hypothesis Testing General Frameworks Parametric Tests: features of specific distributions such as the mean of a Bernoulli or normal distribution. Specification Tests (Semiparametric) Do the data arrive from a Poisson process Are the data normally distributed Nonparametric Tests: Are two discrete processes independent?
5/100 Part 5 – Hypothesis Testing Hypotheses Hypotheses - labels State 0 of Nature – Null Hypothesis State 1 – Alternative Hypothesis Exclusive: Prob(H 0 ∩ H 1 ) = 0 Exhaustive: Prob(H 0 ) + Prob(H 1 ) = 1 Symmetric: Neither is intrinsically “preferred” – the objective of the study is only to support one or the other. (Rare?)
6/100 Part 5 – Hypothesis Testing Testing Strategy
7/100 Part 5 – Hypothesis Testing Posterior (to the Evidence) Odds
8/100 Part 5 – Hypothesis Testing Does the New Drug Work? Hypotheses: H 0 =.50, H 1 =.75 Priors: P 0 =.40, P 1 =.60 Clinical Trial: N = 50, 31 patients “respond’” p =.62 Likelihoods: L 0 (31| =.50) = Binomial(50,31,.50) = L 1 (31| =.75) = Binomial(50,31,.75) = Posterior odds in favor of H 0 = (.4/.6)( / ) = > 1 Priors favored H to 1, but the posterior odds favor H 0, to 1. The evidence discredits H 1 even though the ‘data’ seem more consistent with prior P 1.
9/100 Part 5 – Hypothesis Testing Decision Strategy Prefer the hypothesis with the higher posterior odds A gap in the theory: How does the investigator do the cost benefit test? Starting a new business venture or entering a new market: Priors and market research FDA approving a new drug or medical device. Priors and clinical trials Statistical Decision Theory adds the costs and benefits of decisions and errors.
10/100 Part 5 – Hypothesis Testing An Alternative Strategy Recognize the asymmetry of null and alternative hypotheses. Eliminate the prior odds (which are rarely formed or available).
11/100 Part 5 – Hypothesis Testing
12/100 Part 5 – Hypothesis Testing Classical Hypothesis Testing The scientific method applied to statistical hypothesis testing Hypothesis: The world works according to my hypothesis Testing or supporting the hypothesis Data gathering Rejection of the hypothesis if the data are inconsistent with it Retention and exposure to further investigation if the data are consistent with the hypothesis Failure to reject is not equivalent to acceptance.
13/100 Part 5 – Hypothesis Testing Asymmetric Hypotheses Null Hypothesis: The proposed state of nature Alternative hypothesis: The state of nature that is believed to prevail if the null is rejected.
14/100 Part 5 – Hypothesis Testing Hypothesis Testing Strategy Formulate the null hypothesis Gather the evidence Question: If my null hypothesis were true, how likely is it that I would have observed this evidence? Very unlikely: Reject the hypothesis Not unlikely: Do not reject. (Retain the hypothesis for continued scrutiny.)
15/100 Part 5 – Hypothesis Testing Some Terms of Art Type I error: Incorrectly rejecting a true null Type II error: Failure to reject a false null Power of a test: Probability a test will correctly reject a false null Alpha level: Probability that a test will incorrectly reject a true null. This is sometimes called the size of the test. Significance Level: Probability that a test will retain a true null = 1 – alpha. Rejection Region: Evidence that will lead to rejection of the null Test statistic: Specific sample evidence used to test the hypothesis Distribution of the test statistic under the null hypothesis: Probability model used to compute probability of rejecting the null. (Crucial to the testing strategy – how does the analyst assess the evidence?)
16/100 Part 5 – Hypothesis Testing Possible Errors in Testing Correct Decision Type II Error Type I Error Correct Decision Hypothesis is Hypothesis is True False I Do Not Reject the Hypothesis I Reject the Hypothesis
17/100 Part 5 – Hypothesis Testing A Legal Analogy: The Null Hypothesis is INNOCENT Correct Decision Type II Error Guilty defendant goes free T ype I Error Innocent defendant is convicted Correct Decision Null Hypothesis Alternative Hypothesis Not Guilty Guilty Finding: Verdict Not Guilty Finding: Verdict Guilty The errors are not symmetric. Most thinkers consider Type I errors to be more serious than Type II in this setting.
18/100 Part 5 – Hypothesis Testing (Jerzy) Neyman – (Karl) Pearson Methodology “Statistical” testing Methodology Formulate the “null” hypothesis Decide (in advance) what kinds of “evidence” (data) will lead to rejection of the null hypothesis. I.e., define the rejection region Gather the data Mechanically carry out the test.
19/100 Part 5 – Hypothesis Testing Formulating the Null Hypothesis Stating the hypothesis: A belief about the “state of nature” A parameter takes a particular value There is a relationship between variables And so on… The null vs. the alternative By induction: If we wish to find evidence of something, first assume it is not true. Look for evidence that leads to rejection of the assumed hypothesis. Evidence that rejects the null hypothesis is significant
20/100 Part 5 – Hypothesis Testing Example: Credit Scoring Rule Investigation: I believe that Fair Isaacs relies on home ownership in deciding whether to “accept” an application. Null hypothesis: There is no relationship Alternative hypothesis: They do use homeownership data. What decision rule should I use?
21/100 Part 5 – Hypothesis Testing Some Evidence = Homeowners
22/100 Part 5 – Hypothesis Testing Hypothesis Test Acceptance rate for homeowners = 5030/( ) = Acceptance rate for renters is H 0 : Acceptance rate for renters is not less than for owners. H 0 : p(renters) > H 1 : p(renters) <.82055
23/100 Part 5 – Hypothesis Testing The Rejection Region What is the “rejection region?” Data (evidence) that are inconsistent with my hypothesis Evidence is divided into two types: Data that are inconsistent with my hypothesis (the rejection region) Everything else
24/100 Part 5 – Hypothesis Testing My Testing Procedure I will reject H 0 if p(renters) <.815 (chosen arbitrarily) Rejection region is sample values of p(renters) < 0.815
25/100 Part 5 – Hypothesis Testing Distribution of the Test Statistic Under the Null Hypothesis Test statistic p(renters) = 1/N i Accept(=1 or 0) Use the central limit theorem: Assumed mean = Implied standard deviation = sqr(.82055*.17945/7413)= Using CLT, normally distributed. (N is very large). Use z = (p(renters) ) /.00459
26/100 Part 5 – Hypothesis Testing Alpha Level and Rejection Region Prob(Reject H 0 |H 0 true) = Prob(p <.815 | H 0 is true) = Prob[(p )/.00459) = Prob[z < ] = Probability of a Type I error Alpha level for this test
27/100 Part 5 – Hypothesis Testing Distribution of the Test Statistic and the Rejection Region Area=.11333
28/100 Part 5 – Hypothesis Testing The Test The observed proportion is 5469/( ) = 5469/7314 = The null hypothesis is rejected at the % significance level (by the design of the test)
29/100 Part 5 – Hypothesis Testing Power of the test
30/100 Part 5 – Hypothesis Testing Power Function for the Test (Power = size when alternative = the null.)
31/100 Part 5 – Hypothesis Testing Application: Breast Cancer On Long Island Null Hypothesis: There is no link between the high cancer rate on LI and the use of pesticides and toxic chemicals in dry cleaning, farming, etc. Neyman-Pearson Procedure Examine the physical and statistical evidence If there is convincing covariation, reject the null hypothesis What is the rejection region? The NCI study: Working null hypothesis: There is a link: We will find the evidence. How do you reject this hypothesis?
32/100 Part 5 – Hypothesis Testing Formulating the Testing Procedure Usually: What kind of data will lead me to reject the hypothesis? Thinking scientifically: If you want to “prove” a hypothesis is true (or you want to support one) begin by assuming your hypothesis is not true, and look for evidence that contradicts the assumption.
33/100 Part 5 – Hypothesis Testing Hypothesis About a Mean I believe that the average income of individuals in a population is $30,000. H 0 : μ = $30,000 (The null) H 1 : μ ≠ $30,000 (The alternative) I will draw the sample and examine the data. The rejection region is data for which the sample mean is far from $30,000. How far is far????? That is the test.
34/100 Part 5 – Hypothesis Testing Application The mean of a population takes a specific value: Null hypothesis: H 0 : μ = $30,000 H 1 : μ ≠ $30,000 Test: Sample mean close to hypothesized population mean? Rejection region: Sample means that are far from $30,000
35/100 Part 5 – Hypothesis Testing Deciding on the Rejection Region If the sample mean is far from $30,000, reject the hypothesis. Choose, the region, for example, The probability that the mean falls in the rejection region even though the hypothesis is true (should not be rejected) is the probability of a type 1 error. Even if the true mean really is $30,000, the sample mean could fall in the rejection region. 29,500 30,000 30,500 Rejection
36/100 Part 5 – Hypothesis Testing Reduce the Probability of a Type I Error by Making the (non)Rejection Region Wider 28,500 29,500 30,000 30,500 31,500 Reduce the probability of a type I error by moving the boundaries of the rejection region farther out. You can make a type I error impossible by making the rejection region very far from the null. Then you would never make a type I error because you would never reject H 0. Probability outside this interval is large. Probability outside this interval is much smaller.
37/100 Part 5 – Hypothesis Testing Setting the α Level “α” is the probability of a type I error Choose the width of the interval by choosing the desired probability of a type I error, based on the t or normal distribution. (How confident do I want to be?) Multiply the z or t value by the standard error of the mean.
38/100 Part 5 – Hypothesis Testing Testing Procedure The rejection region will be the range of values greater than μ 0 + zσ/√N or less than μ 0 - zσ/√N Use z = 1.96 for 1 - α = 95% Use z = for 1 - α = 99% Use the t table if small sample, variance is estimated and sampling from a normal distribution.
39/100 Part 5 – Hypothesis Testing Deciding on the Rejection Region If the sample mean is far from $30,000, reject the hypothesis. Choose, the region, say, Rejection I am 95% certain that I will not commit a type I error (reject the hypothesis in error). (I cannot be 100% certain.)
40/100 Part 5 – Hypothesis Testing The Testing Procedure (For a Mean)
41/100 Part 5 – Hypothesis Testing The Test Procedure Choosing z = 1.96 makes the probability of a Type I error Choosing z = would reduce the probability of a Type I error to Reducing the probability of a Type I error reduces the power of the test because it reduces the probability that the null hypothesis will be rejected.
42/100 Part 5 – Hypothesis Testing P Value Probability of observing the sample evidence assuming the null hypothesis is true. Null hypothesis is rejected if P value <
43/100 Part 5 – Hypothesis Testing P value < Prob[p(renter) <.74774] = Prob[z < ( )/.00459] = (-15.86) = * Impossible =.11333
44/100 Part 5 – Hypothesis Testing Confidence Intervals For a two sided test about a parameter, a confidence interval is the complement of the rejection region. (Proof in text, p. 338)
45/100 Part 5 – Hypothesis Testing Confidence Interval If the sample mean is far from $30,000, reject the hypothesis. Choose, the region, say, Rejection I am 95% certain that the confidence interval contains the true mean of the distribution of incomes. (I cannot be 100% certain.) Confidence
46/100 Part 5 – Hypothesis Testing One Sided Tests H 0 = 0, H 1 0 Rejection region is sample mean far from 0 in either direction H 0 = 0, H 1 > 0. Sample means less than 0 cannot be in the rejection region. Entire rejection region is above 0. Reformulate: H 0 0.
47/100 Part 5 – Hypothesis Testing Likelihood Ratio Tests
48/100 Part 5 – Hypothesis Testing Carrying Out the LR Test In most cases, exact distribution of the statistic is unknown Use -2log Chi squared  For a test about 1 parameter, threshold value is 3.84 (5%) or 6.45 (1%)
49/100 Part 5 – Hypothesis Testing Poisson Likelihood Ratio Test 49
50/100 Part 5 – Hypothesis Testing Generalities About LR Test
51/100 Part 5 – Hypothesis Testing Gamma Application
52/100 Part 5 – Hypothesis Testing Specification Tests Generally a test about a distribution where the alternative is “some other distribution.” Test is generally based on a feature of the distribution that is true under the null but not true under the alternative.
53/100 Part 5 – Hypothesis Testing Poisson Specification Tests 3820 observations on doctor visits Poisson distribution?
54/100 Part 5 – Hypothesis Testing Deviance Test Poisson Distribution p(x) = exp(- ) x /x! H 0 : Everyone has the same Poisson Distribution H 1 : Everyone has their own Poisson distribution Under H 0, observations will tend to be near the mean. Under H 1, there will be much more variation. Likelihood ratio statistic (Text, p. 348)
55/100 Part 5 – Hypothesis Testing Deviance Test
56/100 Part 5 – Hypothesis Testing Dispersion Test Poisson Distribution p(x) = exp(- ) x /x! H 0 : The distribution is Poisson H 1 : The distribution is something else Under H 0, the mean will be (almost) the same as the variance Approximate Likelihood ratio statistic (Text, p. 348) = N * Variance / Mean For the doctor visit data, this is 22,348.6 vs. chi squared with 1 degree of freedom. H 0 is rejected.
57/100 Part 5 – Hypothesis Testing Specification Test - Normality Normal Distribution is symmetric and has kurtosis = 3. Compare observed 3 rd and 4 th moments to what would be expected from a normal distribution.
58/100 Part 5 – Hypothesis Testing Symmetric and Skewed Distributions
59/100 Part 5 – Hypothesis Testing Kurtosis: t vs. Normal Kurtosis of normal(0,1) = 3 Kurtosis of t[k] = 3 + 6/(k-4); for t = 3+6/(5-4) = 9.
60/100 Part 5 – Hypothesis Testing Bowman and Shenton Test for Normality
61/100 Part 5 – Hypothesis Testing Testing for a Distribution H 0 : The distribution is assumed H 1 : The assumed distribution is incorrect Strategy: Do the features of the sample resemble what we would observe if H 0 were correct Continuous: CDF of data resemble CDF of the assumed distribution Discrete: Sample cell probabilities resemble predictions from the assumed distribution
62/100 Part 5 – Hypothesis Testing Probability Plot for Normality
63/100 Part 5 – Hypothesis Testing Normal (log)Income?
64/100 Part 5 – Hypothesis Testing Random Sample from Normal
65/100 Part 5 – Hypothesis Testing Normality Tests
66/100 Part 5 – Hypothesis Testing Kolmogorov - Smirnov Test
67/100 Part 5 – Hypothesis Testing Chi Squared Test for a Discrete Distribution Outcomes = A 1, A 2,…, A M Predicted probabilities based on a theoretical distribution = E 1 ( ), E 2 ( ),…,E M ( ). Sample cell frequencies = O 1,…,O M
68/100 Part 5 – Hypothesis Testing Test Statistics
69/100 Part 5 – Hypothesis Testing V2 Rocket Hits Km 2 areas of South London in a grid (24 by 24) 535 rockets were fired randomly into the grid = N P(a rocket hits a particular grid area) = 1/576 = = θ Expected number of rocket hits in a particular area = 535/576 = How many rockets will hit any particular area? 0,1,2,… could be anything up to 535. The is the λ for a Poisson distribution: Adapted from Richard Isaac, The Pleasures of Probability, Springer Verlag, 1995, pp
74/100 Part 5 – Hypothesis Testing Interpreting The Process λ = Probabilities: P(X=0) =.4266 P(X=1) =.3634 P(X=2) =.1548 P(X=3) =.0437 P(X=4) =.0094 P(X>4) =.0021 There are 169 squares There are 144 “trials” Expect.4266*169 = 72.1 to have 0 hits/square Expect.3634*169 = 61.4 to have 1 hit/square Etc. Expect the average number of hits/square to =.852.
75/100 Part 5 – Hypothesis Testing Does the Theory Work? Theoretical Outcomes Sample Outcomes OutcomeProbabilityNumber of Cells Sample ProportionNumber of cells > *Prob(Outcome)Observed frequencies
76/100 Part 5 – Hypothesis Testing Chi Squared for the Bombing Run 76
77/100 Part 5 – Hypothesis Testing Difference in Means of Two Populations Two Independent Normal Populations Common known variance Common unknown variance Different Variances One and two sided tests Paired Samples Means of paired observations Treatments and Controls – Diff-in-Diff SAT Nonparametric – Mann/Whitney Two Bernoulli Populations
78/100 Part 5 – Hypothesis Testing Comparing Two Normal Populations
79/100 Part 5 – Hypothesis Testing Unknown Common Variance
80/100 Part 5 – Hypothesis Testing Household Incomes, Equal Variances t test of equal means INCOME by MARRIED MARRIED = 0 Nx = 817 MARRIED = 1 Ny = 3057 t [ 3872] = P value = Mean Std.Dev. Std.Error INCOME MARRIED = MARRIED =
81/100 Part 5 – Hypothesis Testing Unknown Different Variances
82/100 Part 5 – Hypothesis Testing 2 Proportions Two Bernoulli Populations: X i ~ Bernoulli with Prob(x i =1) = x Y i ~ Bernoulli with Prob(y i =1) = y H 0 : x = y The sample proportions are p x = (1/N x ) i x i and p y = (1/N y ) i y i Sample variances are p x (1-p x ) and p y (1-p y ). Use the Central Limit Theorem to form the test statistic.
83/100 Part 5 – Hypothesis Testing z Test for Equality of Proportions Application: Take up of public health insurance t test of equal means PUBLIC by FEMALE FEMALE =0 Nx = 1812 FEMALE =1 Ny = 1565 t [ 3375] = P value = Mean Std.Dev. Std.Error PUBLIC FEMALE = FEMALE =
84/100 Part 5 – Hypothesis Testing Paired Sample t and z Test Observations are pairs (X i,Y i ), i = 1,…,N Hypothesis x = y. Both normal distributions. May be correlated. Medical Trials: Smoking vs. Nonsmoking (separate individuals, probably independent) SAT repeat tests, before and after. (Definitely correlated) Test is based on D i = X i – Y i. Same as earlier with H0: D = 0.
85/100 Part 5 – Hypothesis Testing Treatment Effects SAT Do Overs Experiment: X 1, X 2, …, X N = first SAT score, Y 1, Y 2, …, Y N = second Treatment: T 1,…,T N = whether or not the student took a Kaplan (or similar) prep score Hypothesis, y > x. Placebo: In Medical trials, N1 subjects receive a drug (treatment), N2 receive a placebo. Hypothesis: Effect is greater in the treatment group than in the control (placebo) group.
86/100 Part 5 – Hypothesis Testing Measuring Treatment Effects
87/100 Part 5 – Hypothesis Testing Treatment Effects in Clinical Trials Does Phenogyrabluthefentanoel (Zorgrab) work? Investigate: Carry out a clinical trial. N+0 = “The placebo effect” N+T – N+0 = “The treatment effect” The hypothesis is that the difference in differences has mean zero. Placebo Drug Treatment No Effect N00 N0T Positive Effect N+0 N+T
88/100 Part 5 – Hypothesis Testing A Test of Independence In the credit card example, are Own/Rent and Accept/Reject independent? Hypothesis: Prob(Ownership) and Prob(Acceptance) are independent Formal hypothesis, based only on the laws of probability: Prob(Own,Accept) = Prob(Own)Prob(Accept) (and likewise for the other three possibilities. Rejection region: Joint frequencies that do not look like the products of the marginal frequencies.
89/100 Part 5 – Hypothesis Testing Contingency Table Analysis The Data: Frequencies Reject Accept Total Rent 1,845 5,469 7,214 Own 1,100 5,030 6,630 Total 2,945 10,499 13,444 Step 1: Convert to Actual Proportions Reject Accept Total Rent Own Total
90/100 Part 5 – Hypothesis Testing Independence Test Step 2: Expected proportions assuming independence: If the factors are independent, then the joint proportions should equal the product of the marginal proportions. [Rent,Reject] x = [Rent,Accept] x = [Own,Reject] x = [Own,Accept] x =
91/100 Part 5 – Hypothesis Testing Comparing Actual to Expected
92/100 Part 5 – Hypothesis Testing When is the Chi Squared Large? Critical values from chi squared table Degrees of freedom = (R-1)(C-1). Critical chi squared D.F
93/100 Part 5 – Hypothesis Testing Analyzing Default Do renters default more often (at a different rate) than owners? To investigate, we study the cardholders (only) DEFAULT OWNRENT 0 1 All All
94/100 Part 5 – Hypothesis Testing Hypothesis Test
95/100 Part 5 – Hypothesis Testing Multiple Choices: Travel Mode 210 Travelers between Sydney and Melbourne 4 available modes, air, train, bus, car Among the observed variables is income. Does income help to explain mode choice? Hypothesis: Mode choice and income are independent.
96/100 Part 5 – Hypothesis Testing Travel Mode Choices
97/100 Part 5 – Hypothesis Testing Travel Mode Choices and Income | Travel MODE Data | |INCOME | AIR TRAIN BUS CAR || Total | |LOW | || 63 | | | || | | |MEDIUM | || 76 | | | || | | |HIGH | || 71 | | | || | |==============================================++==========+ |Total | || 210 | | | || |