Sampling distribution of OLS estimators 4We have learned that MLR.1-MLR4 will guarantee that OLS estimators are unbiased. 4In addition, we have learned that, by adding MLR.5, you can estimate the variance of OLS estimators. 4However, in order to conduct hypothesis tests, we need to know the sampling distribution of the OLS estimators. 2
4To do so, we introduce one more assumption Assumption MLR.6 (i) The population error u is independent of explanatory variables, x 1,x 2,…,x k, and (ii) u~N(0,σ 2 ). 3
Classical Linear Assumption 4MLR.1 through MLR6 are called the classical linear model (CLM) assumptions. 4Note that MLR.6(i) automatically satisfies MLR.4(provided E(u)=0 which we always assume), but MLR.4 does not necessarily indicate MLR.6(i). In this sense, MLR.4 is redundant. However, to emphasize that we are making additional assumption, MLR1 through MLR.6 are called CLM assumptions. 4
4Theorem 4.1 5 and Proof: See the front board Conditional on X, we have
Hypothesis testing 4Consider the following multiple linear regression. y=β 0 +β 1 x 1 +β 2 x 2 +….+β k x k +u 4Now, I present a well known theorem. 6
Theorem 4.2 : t-distribution for the standardized estimators. Under MLR1 through MLR6 (CLM assumptions) we have 7 This means that the standardized coefficient follows t-distribution with n-k-1 degree of freedom. Proof: See the front board.
One-sided test 4One sided test has the following form The null hypothesis: H 0 : β j =0 The alternative hypothesis: H 1 : β j >0 8
4Test procedure. 1.Set the significance level. Typically, it is set at 0.05. 2.Compute the t-statistics under the H 0. that is 9 Note: Under H 0, β j =0, so this simplified to this.
t n-k-1,α 3. Find the cutoff number This cutoff number is illustrated below. T-distribution with n-k-1 degree of freedom 10 The cutoff number. 4. Reject the null hypothesis if the t-statistic falls in the rejection region. This is illustrated in the next page.
t n-k-1,α The illustration of the rejection decision. T-distribution with n-k-1 degree of freedom 11 If t-statistic falls in the rejection region, you reject the null hypothesis. Otherwise, you fail to reject the null hypothesis. Rejection region. (Reject H 0 )
-t n-k-1,α 12 Note, if you want to test if β j is negative, you have the following null and alternative hypotheses, H 0 : β j =0 H 1 : β j <0 Then the rejection region will be on the negative side. Nothing else changes. Rejection region.
Example 4The next slide shows the estimated result of the log salary equation for 338 Japanese economists. (Estimation is done by STATA.) 4The estimated regression is Log(salary)=β 0 +β 1 (female)+ δ(other variables)+u 13
4Q1. Test if female salary is lower than male salary at 5% significance level (i.e., =0.05). That is test, H 0 : β 1 =0 H 1 : β 1 <0 15
Two sided test 4Two sided test has the following form The null hypothesis: H 0 : β j =0 The alternative hypothesis: H 1 : β j ≠0 16
4Test procedure. 1.Set the significance level. Typically, it is set at 0.05. 2.Compute the t-statistics under the H 0. that is 17 Note: Under H0, β j =0, so this simplified to this.
t n-k-1,α/2 3. Find the cutoff number. This cutoff number is illustrated below. T-distribution with n-k-1 degree of freedom 18 The cutoff number. 4. Reject the null hypothesis if t-statistic falls in the rejection region above. -t n-k-1,α/2 Rejection region
4When you reject the null hypothesis β j ≠0 using two sided test, we say that the variable x j is statistically significant. 19
Exercise 4Consider again the following regression Log(salary)=β 0 +β 1 (female)+ δ(other variables)+u 4This time, test if female coefficient is equal to zero or not using two sided test at the 5% significance level. That is, test H 0 : β 1 =0 H 1 : β 1 ≠0 20
The p-value 4The p-value is the minimum level of the significance level ( ) at which, the coefficient is statistically significant. 4STATA program automatically compute this value for you. 4Take a look at the salary regression again. 22
Other hypotheses about βj 4You can test other hypotheses, such as β j =1 or β j =-1. Consider the null hypothesis β j =a Then, all you have to do is to compute t- statistics as Then other test procedure is exactly the same. 24
4Consider the following regression results. Log(crime)=-6.63 + 1.27log(enroll) (1.03) (0.11) n=97, R2=0.585 Now, test if coefficient for log(enroll) is equal to 1 or not using two sided test at the 5% significance level. 25
The F-test Testing general linear restrictions 4You are often interested in more complicated hypothesis testing. First, I will show you some examples of such tests using the salary regression example. 26
4Example 1: Modified salary equation. Log(salary)=β 0 +β 1 (female) +β 2 (female)×(Exp>20) +β(other variables)+u Where (Exp>20) is the dummy variable for those with experience greater than 20 years. Then, it is easy to show that gender salary gap among those with experience greater than 20 years is given by β 1 +β 2. Then you want to test the following H 0 : β 1 +β 2 =0 H 1 : β 1 +β 2 ≠0 28
4Example 2: More on modified salary equation. Log(salary)=β 0 +β 1 (female) +β 2 (female)×(Exp) +β(other variables)+u Where exp is the years of experience. Then, if you want to show if there is a gender salary gap at experience equal to 5, you test H0: β 1 +5*β 2 =0 H1: β 1 +5*β 2 ≠0 29
4Example 3: The price of houses. Log(price)=β 0 +β 1 (assessed price) +β 2 (lot size) +β 3 (square footage) +β 4 (# bedrooms) Then you would be interested in H 0 : β 1 =1, β 2 =0, β 3 =0, β 4 =0 H 1 : H 0 is not true. Note in this case, there are 4 equations in H 0. 30
The procedure for F-test 4Linear restrictions are tested using F-test. The general procedure can be explained using the following example. Y= β 0 +β 1 x 1 +β 2 x 2 +β 3 x 3 +β 4 x 4 +u --------------(1) Suppose you want to test H 0 : β 1 =1, β 2 =β 3, β 4 =0 H 1 : H 0 is not true 31
4Step 1: Plug in the hypothetical values of coefficient given by H 0 in the equation 1. Then you get Y= β 0 +1*x 1 +β 2 x 2 +β 2 x 3 +0*x 4 +u (Y-x 1 )= β 0 +β 2 (x 2 +x 3 )+u ----------------------(2) (2) Is called the restricted model. On the other hand, the original equation (1) is called the unrestricted model. 32
4In the restricted model, the dependent variable is (Y-x 1 ). And now, there is only one explanatory variable, which is (x 2 +x 3 ). 4Now, I can describe the testing procedure. 33
4Step 1: Estimate the unrestricted model (1), and compute SSR. Call this SSR ur. 4Step 2: Estimate the restricted model (2), and compute SSR. Call this SSR r. 4Step 3: Compute the F-statistics as 34 Where q is the number of equations in H 0. q = numerator degree of freedom (n-k-1) =denominator degree of freedom
It is know that F statistic follows the F distribution with degree of freedom (q,n- k-1). That is; F~F q,n-k-1 4Step5: Set the significance level. (Usually, it is set at 0.05) 4Step 6. Find the cutoff value c, such that P(F q,n-k-1 >c)=. This is illustrated in the next slide. 35 Numerator degree of freedom Denominator degree of freedom
36 c 1- The density of F q,n-k-1 Rejection region Step 7: Reject if F stat falls in the rejection region. The cutoff points can be found in the table in the next slide.
Example Log(salary)=β 0 +β 1 (female) +β 2 (female)×(Exp>20) + δ(other variables)+u -----(1) Now, let us test the following H 0 : β1+β2=0 H 1 : β1+β2≠0 38
4Then, restricted model is Log(salary)=β 0 +β 1 [(female)-(female)×(Exp>10)] +β(other variables)+u ------------(2) 4The following slides show the estimated results for unrestricted and restricted models. 39
4Since we have only one equation in H 0, q=1. And you can see that (n-k-1)=(338-12-1)=325 F=[(9.54110486 -9.54090327)/1]/[9.54090327/325] =0.0068 4The cutoff point at 5% significance level is 3.84. 4Since F-stat does not falls in the rejection, we fail to reject the null hypothesis. In other words, we did not find evidence that there is a gender gap among those with experience greater than 20 years. 42
4In fact, STATA does F-test automatically. 44 After estimation, type this command
F-test for special case The exclusion restrictions 4Consider the following model Y= β 0 +β 1 x 1 +β 2 x 2 +β 3 x 3 +β 4 x 4 +u -------(1) 4Often you would like to test if a subset of coefficients are all equal to zero. This type of restriction is called `the exclusion restrictions’. 45
4Suppose you want to test if β 2,β 3,β 4 are jointly equal to zero. Then, you test H0 : β 2 =0, β 3 =0, β 4 =0 H1: H0 is not true. 46
4In this special type of F-test, the restricted and unrestricted equations look like Y= β 0 +β 1 x 1 +β 2 x 2 +β 3 x 3 +β 4 x 4 +u -------(1) Y= β 0 +β 1 x 1 +u -------(2) 4In this special case, F statistic has the following representation 47 Proof: See the front board.
4When we reject this type of null hypothesis, we say x 2, x 3 and x 4 are jointly significant. 48
Example of the test of exclusion restrictions 4Suppose you are estimating an salary equations for baseball players. Log(salary)= β0 + β 1 ( years in league) + β 2 (average games played) + β 3 ( batting average) + β 4 ( homeruns) +β 5 ( runs batted) +u 49
Do batting average, homeruns and runs batted matters for salary after years in league and average games played are controlled for? To answer to this question, you test H 0 : β 3 =0, β 4 =0, β 5 =0 H 1 : H 0 is not true. 50
VariablesCoefficientStandard errors Years in league0.0689***0.0121 Average games played0.0126***0.0026 Batting average0.000980.0011 Homeruns0.01440.016 Runs batted0.1080.0072 Constant11.19***0.29 # obs353 R squared0.6278 SST181.186 51 As can bee seen, batting average, homeruns and runs batted do not have statistically significant t-stat at the 5% level. Unrestricted model
4The F stat is F=[(198.311-181.186)/3]/[181.186/(353-5-1)]=10.932 The cutoff number of about 2.60. So we reject the null hypothesis at 5% significance level. This is an reminder that even if each coefficient is individually insignificant, they may be jointly significant. 52 VariablesCoefficientStandard errors Years in league0.0713***0.0125 Average games played0.0202***0.0013 Constant11.22***0.11 # obs353 R squared0.5971 SST198.311 Restricted model