Download presentation
Presentation is loading. Please wait.
Published byMarylou McKinney Modified over 8 years ago
1
Significance Tests for Regression Analysis
2
A. Testing the Significance of Regression Models The first important significance test is for the regression model as a whole. In this case, it is a test of the model Y = f(X) The null hypothesis is: H 0 : = 0.0 Here, the test is redundant. Since there is only one variable in the model (X), it duplicates the information provided by the t-test.
3
In the case of multiple regression, it becomes much more important for evaluating models with several independent variables. In that case, the model is Y = f(X 1 X 2 X 3... X k ) where the null hypothesis is H 0 : 1 = 2 = 3 =... K = 0.0 Notice that this is similar to the null hypothesis in the analysis of variance with multiple treatment groups.
4
All the information we require is already available in the analysis of variance summary table. In our little time/temperature example, we have the two mean squares, for the model (98.637) and for the error term (0.030). The ratio of the two is 3287.90, clearly greater than one. However, making the usual significance test with one and one degrees of freedom at the 0.05 level, we find the critical value of F is 161.40 (Appendix 3, p. 544). Since 3287.90 is greater than 161.40, the F-ratio lies inside the region of rejection. Hence, we REJECT the null hypothesis that none of the regression coefficients in the model is greater than zero in favor of the alternate hypothesis that at least ONE of the regression coefficients in our model is greater than zero.
6
B. Testing the Significance of the Regression Coefficient The null hypothesis in the significance test for the regression coefficient (i.e., slope) is: H 0 : = 0.0 This simple symbolic expression says more than might first appear. It says: If we begin by assuming that there is no relationship between X and Y in general (i.e., in the universe from which our sample data come), then how likely is it that we would find a regression coefficient for our sample to be DIFFERENT FROM 0.0? Put the other way around, if we find a relationship between X and Y in the sample data, can we infer that there is a relationship between X and Y in general?
7
To test this null hypothesis, we use our old friend the t-test: where the standard error is:
8
This is the standard deviation of the sampling distribution of all theoretically possible regression coefficients for samples of the same size drawn randomly from the same universe. Recall that the mean of this sampling distribution has a value equal to the population characteristic (parameter), in this case the value of the regression coefficient in the universe. Under the null hypothesis, we initially assume that this value is 0.0. To test the significance of the regression coefficient (and the model as a whole), we need the statistical information found in the usual analysis of variance summary table. We already have most of this information for our previous example.
9
Recall that R 2 YX, the Coefficient of Determination, was found from R 2 YX = SS Regression / SS Total From our time/temperature example, remember that R 2 YX was 0.999. Total sum of squares can be found from SS Total = s Y 2 (N - 1) From our previous calculations, remember that s Y 2 was 49.333. Thus, SS Total is SS Total = (49.333) (3 - 1) SS Total = (49.333) (2) SS Total = 98.667
10
By rearranging the algorithm for R 2 YX, we get SS Regression = (R 2 YX )(SS Total ) With our sample data, SS Regression = (0.9997)(98.667) SS Regression = 98.637 Now, because of the identity among the three sums of squares, we can find the sum of squares for the error term (residual) by subtraction, SS Error = SS Total - SS Regression SS Error = 98.667 - 98.637 SS Error = 0.030
11
All we need to do now is to determine the various numbers of degrees of freedom. Because we have three observations, we know that we still have two total degrees of freedom. Degrees of freedom for the model is the number of independent variables in the model, in this case one. Because of the identity between the three values of degrees of freedom, the total less the model gives us the error degrees of freedom, in this case 2 – 1, or one degree of freedom for the error term. Now we can complete an analysis of variance summary table for the regression example.
12
Table 1. Analysis of Variance Summary Table for Time-Temperature Example. ========================================================== Source ss dfMean Square F ----------------------------------------------------------------------------------------------------- Regression 98.637 1 98.637 3287.90 (Between) Error 0.030 1 0.030 (Within) Total 98.667 2 -----------------------------------------------------------------------------------------------------
13
Now we can return to the task of testing the significance of the regression coefficient. First, we need to estimate the standard error of b. This is In our example, the variance of X, time, was 5.083.
14
Now we have the value of our "currency conversion" factor which allows us to convert the difference between our sample mean and the mean of the sampling distribution into Student's t values that lie on the underlying x-axis. Recall that our regression coefficient had a value of - 3.115. Remember also that, under the null hypothesis = 0.0. Thus, our t-statistic is
15
This, for practical purposes, is the same as Thus, the value of the t-statistic is
16
Since we have not specified in advance whether our sample regression coefficient would have a positive or a negative value, we should perform a two-tailed test of significance. Let's again set alpha to be 0.05. The appropriate sampling distribution of Student's t for this test is the one defined by degrees of freedom for the error term because we use the MS Error in calculating the value of the t-test. From Appendix 2, p. 543, we find the critical value to be 12.706 (row df = 1, two-tailed test column 0.05). Because this is a two-tailed test, we have two critical values, + 12.706 and - 12.706.
17
Since the t-statistic of - 54.342 is GREATER THAN the critical value - 12.706, we know that it lies within the region of rejection, and therefore we REJECT the null hypothesis. We conclude that the sample regression coefficient is statistically significant at the 0.05 level. This means that the association between time of first sun and afternoon high temperature probably holds in general, not just in our sample.
18
C. Significance Test for the Correlation Coefficient We could calculate a critical value of r xy based of the general relationship between r xy and F, which is: The critical value can be found by: Alternatively, we could use a table such as Appendix 5, p. 548:
20
In the present case, the correlation coefficient is: r XY = - 0.9996 At = 0.05 with df = 1 (for df = n – 2), the critical value of r xy for a two-tailed test from Appendix 5 is 0.997. Therefore, the correlation coefficient IS statistically significant at the 0.05 level.
21
Simple Regression Analysis Example PPD 404 Model: MODEL1 Dependent Variable: TEMP Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 1 98.63388 98.63388 3008.333 0.0116 Error 1 0.03279 0.03279 C Total 2 98.66667 Root MSE 0.18107 R-square 0.9997 Dep Mean 82.66667 Adj R-sq 0.9993 C.V. 0.21904 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 107.065574 0.45696262 234.298 0.0027 TIME 1 -3.114754 0.05678855 -54.848 0.0116
22
Time and Temperature Example Correlation Analysis 2 'VAR' Variables: TIME TEMP Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 3 TIME TEMP TIME 1.00000 -0.99983 0.0 0.0116 TEMP -0.99983 1.00000 0.0116 0.0
23
Significance Tests Exercise A regression analysis produced the following analysis of variance summary table as well as the following regression results: the regression coefficient had a value of 3.14. The variance of the independent variable (X) was 1.92. Complete the computations in the ANOVA summary table. Use Appendix 3 for the F-test and Appendix 2 for the t-test. Assume that = 0.05. NOTE: Be sure to perform a two-tailed t-test for the regression coefficient. ================================================================================================== SourceSS dfMean Square F -------------------------------------------------------------------------------------------------- Regression 655.52 1 Error 195.80 29 Total 851.32 30 -------------------------------------------------------------------------------------------------- 1. What is the critical value of F? ______________ 2. Is the model statistically significant? ______________ 3. What is the value of the standard error for b? ______________ 4. What is the value of the t statistic? ______________ 5. What is the critical value of t? ______________ 6. Is the regression coefficient statistically significant? ______________
24
Significance Tests Exercise Answers A regression analysis produced the following analysis of variance summary table as well as the following regression results: the regression coefficient had a value of 3.14. The variance of the independent variable (X) was 1.92. Complete the computations in the ANOVA summary table. Use Appendix 3 for the F-test and Appendix 2 for the t-test. Assume that = 0.05. NOTE: Be sure to perform a two-tailed t-test for the regression coefficient. ================================================================================================== SourceSS dfMean Square F -------------------------------------------------------------------------------------------------- Regression 655.52 1 655.520 97.089 Error 195.80 29 6.752 Total 851.32 30 -------------------------------------------------------------------------------------------------- 1. What is the critical value of F? 4.18 2. Is the model statistically significant? Yes 3. What is the value of the standard error for b? 0.342 4. What is the value of the t statistic? 9.017 5. What is the critical value of t? 2.045 6. Is the regression coefficient statistically significant? Yes
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.