Presentation is loading. Please wait.

Presentation is loading. Please wait.

Econ 488 Lecture 5 – Hypothesis Testing Cameron Kaplan.

Similar presentations


Presentation on theme: "Econ 488 Lecture 5 – Hypothesis Testing Cameron Kaplan."— Presentation transcript:

1 Econ 488 Lecture 5 – Hypothesis Testing Cameron Kaplan

2 Classical Assumptions 1.Regression is linear, correctly specified, and has additive error term 2.E(ε i )=0 3.Correlation between X ki and ε i is 0 for all k. 4.ε t is uncorrelated with ε t+1 for all t. 5.Var(ε i )=σ 2 [No Heteroskedasticity] 6.No perfect multicollinearity and sometimes: 7.ε i ~N(0, σ 2 )

3 Sampling Distribution of is assumed to be normally distributed because the stochastic error is assumed to be normally distributed (assumption 7) Usually, we take a sample of size N from a population to produce a single estimator of β, which we call. But what if we took a different sample? We should get a different result for

4 Sampling Distribution of β

5 In OLS, is unbiased, so E( )=β OLS estimators also have the smallest variance possible at any sample size (efficiency) Finally, OLS estimators are consistent. As N increases, variance shrinks. As N->∞, β->

6 Consistency

7 Hypothesis Testing Most times, we only take one sample, so we only get one estimate of How do we know if is meaningful I we can only observe one value in the distribution?

8 Example Suppose we are interested in whether school size has an effect on student performance. Specifically, do students at small schools do better? We estimate the following equation: math10 i = β 0 +β 1 enroll i +β 2 staff i +β 3 totcomp i +ε i

9 Example math10 i = β 0 +β 1 enroll i +β 2 staff i +β 3 totcomp i +ε i Where: math10 = % of students passing the 10 th grade math portion of the Michigan Educational Assessment Program (MEAP) test enroll = school size staff = number of staff/1000 students (to control for how much attention students get) totcomp = average annual teaching compensation (to control for teacher quality)

10 Hypothesis Testing We need to develop a null and alternative hypothesis before running the regression. Null Hypothesis (H 0 ) Usually, you want to reject the null hypothesis Most common null hypothesis: “there is no effect of X on Y” or “ β 1 =0” Alternative Hypothesis (H A or H 1 ) Usually, what you are trying to prove

11 Hypothesis Testing In our example, we would pick H 0 :β 1 ≥0 “there is no negative effect of school size on student performance” H A :β 1 <0 “There is a negative effect of school size on student performance” Test this using meap93.gdt

12 Example 2 Consider the wage equation log(wage i )=β 0 +β 1 educ i +β 2 exer i +β 3 tenure i +ε i The null hypothesis H 0 : β 2 =0 says: once education and tenure have been accounted for, the number of years in the workforce has no effect on hourly wage If β 2 >0, prior work experience contributes to productivity, and to wage.

13 Alternative Hypothesis Usually, we want to reject the null hypothesis. We form an alternative hypothesis – values we don’t expect. One-sided Alternatives We expect there to be a sign on a particular variable based on our economic model e.g. H A : β K >0.

14 Hypothesis Testing log(wage i )=β 0 +β 1 educ i +β 2 exer i +β 3 tenure i +ε i In our example, we might set our hypotheses as H 0 :β 2 ≤0 H A :β 2 >0 We believe that the effect of experience on wages is positive, holding education and tenure fixed.

15 Hypothesis Testing log(wage i )=β 0 +β 1 educ i +β 2 exer i +β 3 tenure i +ε i What should the null and alternative hypotheses for the other coefficients be? H 0 :β 1 ≤0 H A :β 1 >0 H 0 :β 3 ≤0 H A :β 3 >0

16 Two sided alternatives Y i =β 0 +β 1 X 1i +…+β k X ki +ε i H 0 :β 1 =0 H A :β 1 ≠0 Under the alternative, X 1i has a significant effect on the dependent variable without specifying if it’s positive or negative You should use this if you don’t know what sign β k has (not well defined by theory) Or…sometimes it is better to use because it prevents us from forming our hypothesis after looking at the results

17 Other Hypotheses Although H 0 :β k =0 is the most common null hypothesis, sometimes, we want to test whether or not β k is equal to some other constant – usually 1 or -1. Example: Suppose we want to look at the effect of college enrollment on crime. log(crime i )=β 0 +β 1 log(enroll i )+ε i This is a constant elasticity model, where β 1 is the elasticity of crime with respect to enrollment.

18 Other hypotheses log(crime i )=β 0 +β 1 log(enroll i )+ε i We could test, H 0 :β 1 =0 & H A :β 1 ≠0 But more interesting would be to test if β 1 =1 If β 1 >1, then a 1% increase in enrollment leads to a greater than 1% increase in crime, so crime is a bigger problem at large campuses Set up our hypotheses as follows H 0 :β 1 =1 H A :β 1 ≠1

19 t-test Y i =β 0 +β 1 X 1i +…+β k X ki +ε i t-statistic: = estimated regression coefficient of the k th variable = The border value (usually zero) implied by the null hypothesis = The estimated standard error of the coefficient on the k th variable

20 t-test For example, suppose our hypotheses were: H 0 :β 1 =0 H A :β 1 >0 Then, suppose that we estimate that =6, and that =2 We would calculate t as

21 How does the t-test work? β1β1 Distribution of if null is true Suppose we found a value of way out here It’s not very likely that the null hypothesis is true…

22 t-test How does this look for our example? =6 and =2 0 -22 6

23 t-test We want to know, if H 0 really is true (i.e. β 1 really is 0), how likely is it that we could have observed a value of 6? Not very. We can probably say that H 0 is not true. But we need a rule to decide.

24 Hypothesis Testing How do we decide when to reject the null? Choose a level of significance Rule of thumb: 5% level of significance This means that we will rule out H 0 if we would have expected a value of at least as extreme as 6 less than 5% of the time. Instead of trying to figure out this probability using the sampling distribution, we transform the distribution to the t-distribution The t-distribution is almost the same as the standard normal distribution.

25 t-test In our example, t=6-0/2 = 3 Suppose our sample size was 23 We need to compare our t-statistic to the critical t-value, which distinguishes the acceptance region from the rejection region. Look at inside cover of book We want the t-value for 23-2-1= 20 degrees of freedom. For a one sided test with 5% significance, this is t c =1.725 Decision Rule: Reject H 0 if |t k |>t c, and has the sign implied by H A, otherwise do not reject. Here, we reject the null in favor of the alternative, suggesting that X 1 is significant

26 Choosing a Level of Significance Rule of thumb – Significance level = 5% If significance level is too low, we risk what is called a type II error, where we reject the null hypothesis when it is actually true. If we reject H 0 at the 5% level, we say that the coefficient is “statistically significant at the 5% level” Sometimes researchers use asterisks * means significant at 10% ** means significant at 5% *** means significant at 1%

27 Confidence Intervals Confidence Interval - The range that contains the population value a specified percent of the time. The two-sided t-critical value at a specific significance level gives the (1-sig level) confidence interval. So, the 5% significance level is equivalent to the 95% CI.

28 Confidence Intervals For our example, the t-critical value was 2.086 So the 95% CI= 6 ± 2*2.086 = 6±4.172 Or 1.828 to 10.172 We could say that with 95% confidence, the true value of β is between 1.828 and 10.172 Notice that 0 is not in this range. We can reject H 0

29 P-value Alternative to t-test If the true population value was really 0, what is the probability we would have observed a value as extreme as 6? If p is small, reject the null. This is calculated automatically by most econometrics software Reject the null if p is less than the significance level. 0 -2 62

30 Example Student performance and school size using data.

31 F-test (Appendix Ch. 5) What if you want to test a hypothesis that involves multiple coefficients? For example: Suppose we run this regression (data7-2.gdt): wage i = β 0 +β 1 educ i +β 2 exper i +β 3 clerical i +β 4 maint i +β 5 crafts i +ε i clerical, maint, and crafts are job type “dummies” We want to test whether job type matters We would need to test whether β 3, β 4, and β 5 are “jointly significant. H 0 :β 3 =β 4 =β 5 =0 H A : The null hypothesis is not true.

32 F-test Steps 1. Run full regression, get RSS 2. Run constrained regression (without job type variables), get RSS M RSS = RSS from step 1 RSS M = RSS from step 2 M = # of excluded coeffs N = # observations K = # of coefficients in overall equation

33 F-stat Calculate F-stat, and compare it to the critical value of F (from F-table) Degrees of freedom numerator = M Degrees of freedom denominator = N-K-1 If F>F crit reject null hypothesis The variables are jointly significant if you can reject the null.

34 F-test In Gretl Run the model Select test>omit variables Gives F-stat and related p-value


Download ppt "Econ 488 Lecture 5 – Hypothesis Testing Cameron Kaplan."

Similar presentations


Ads by Google