Presentation on theme: "Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of is constant."— Presentation transcript:
IV. Model Assumptions The error term is a normally distributed random variable and The variance of is constant for all values of x, All are independent, not influenced by any other error term.
V. Testing for Significance The theoretical regression equation is a hypothesis about the linear relationship between x and y. If 1 =0, then E(y)= 0 and there is no linear relationship between x and y. So we’ll do a hypothesis test on 1, but first we need an estimate of 2.
A. Estimating the Standard Error The variance of the estimation process is called the Mean Square Error. MSE = SSE/(n-2) We divide by (n-2) because we are estimating 2 parameters, so SSE has (n-2) degrees of freedom. The standard error (s) is the square root of MSE. For the repair cost and car age model.
B. The t-test Most tests of a parameter like 1 are 2-tailed. We want to test whether it is significantly different from zero. A failure to reject means there is no statistical relationship. Ho: 1 =0 Ha: 1 0 Since we use b 1 to estimate 1, we need to use the properties of the sampling distribution of b 1 as the basis for the test.
1. Sampling Distribution of b 1 This, like all other sampling distributions, takes advantage of the central limit theorem and allows us to use the normal distribution. We just need a mean and standard error and we’re off and running. Since is unknown, we use s to calculate s b1. In our example, s b1 = 13.9851.
2. T-test for significance Like any other hypothesis test, we need a critical value, t /2 and a test statistic. In the 2-tailed test, if our test statistic exceeds (in absolute value) t /2, we reject the null and conclude there is a significant statistical relationship between x and y. Since we have hypothesized that 1 = 0.
The Example If we are testing with =.05 and (n-2)=3 degrees of freedom, t.025 = 3.182 The test statistic, t= 75.50/13.9851 = 5.3986. Therefore, we, reject Ho and conclude with 95% confidence that the car’s age has a significant relationship with the repair cost.
Caution: What you can’t say. 1. This is NOT enough evidence to conclude that this is a “cause and effect” relationship. You need theoretical justification and more powerful techniques to establish causality. 2. Rejection of Ho doesn’t allow the conclusion that there is a linear relationship between x and y.
What you can say. You can say that x and y are related and that a linear relationship explains a significant portion of the variability in y over the range of values for x in the sample. Look at the Excel handout I gave you. Can you see where the t-test is reported? Look at the p-value, this also gives you an indication of significance.
C. F-test of overall significance The F-test sets up the hypothesis that ALL estimated slope coefficients are equal to zero. If they are, the test cannot be rejected, and the model is said to be insignificant and possibly worthless. If you only have one independent variable, the situation in this chapter, the result of the F-test is the same as the result of the t-test.
Construct the F-statistic It boils down to calculating another independent estimate of sigma. MSR = SSR/(# of independent variables), so MSR = SSR as long as you have only 1 x variable. F-stat = MSR/MSE
The F-stat is distributed with the F-distribution, with 1 degree of freedom in the numerator, (n-2) degrees of freedom in the denominator. Tables begin on page A-8. Larger values of F will lead to a rejection of Ho that all slope coefficients are = 0 and the conclusion that the model is overall significant.
Calculation and Testing In the repair example, F=MSR/MSE = SSR/MSE = 57,002.5/1955.8333 = 29.1449. The critical F.05, with 1 d.f. in the numerator, 3 d.f. in the denominator F.05 (1,3) = 10.13 Thus we reject Ho and conclude that the model is overall significant. You can also see an F-stat, along with a p-value on your Excel output.
Can you read these p-values and determine whether the estimates are significant?