Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.

Similar presentations


Presentation on theme: "© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model."— Presentation transcript:

1 © 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model

2 © 2002 Prentice-Hall, Inc. Chap 14-2 Chapter Topics Multiple linear regression (MLR) model Residual analysis Influence analysis Testing for the significance of the regression model Inferences on the population regression coefficients Testing portions of the multiple regression model

3 © 2002 Prentice-Hall, Inc. Chap 14-3 Population Y-intercept Population slopes Random Error Multiple Linear Regression Model A relationship between one dependent and two or more independent variables is a linear function Dependent (Response) variable for sample Independent (Explanatory) variables for sample model Residual

4 © 2002 Prentice-Hall, Inc. Chap 14-4 Population Multiple Regression Model Bivariate model

5 © 2002 Prentice-Hall, Inc. Chap 14-5 Sample Multiple Regression Model Bivariate model Sample Regression Plane

6 © 2002 Prentice-Hall, Inc. Chap 14-6 Simple and Multiple Linear Regression Compared: Example Two simple regressions: Multiple regression:

7 © 2002 Prentice-Hall, Inc. Chap 14-7 Multiple Linear Regression Equation Too complicated by hand! Ouch!

8 © 2002 Prentice-Hall, Inc. Chap 14-8 Interpretation of Estimated Coefficients Slope (b i ) Estimated that the average value of Y changes by b i for each one unit increase in X i holding all other variables constant (ceterus paribus) Example: if b 1 = -2, then fuel oil usage (Y) is expected to decrease by an estimated two gallons for each one degree increase in temperature (X 1 ) given the inches of insulation (X 2 ) Y-intercept (b 0 ) The estimated average value of Y when all X i = 0

9 © 2002 Prentice-Hall, Inc. Chap 14-9 Multiple Regression Model: Example ( 0 F) Develop a model for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches.

10 © 2002 Prentice-Hall, Inc. Chap 14-10 Sample Multiple Regression Equation: Example Excel Output For each degree increase in temperature, the estimated average amount of heating oil used is decreased by 5.437 gallons, holding insulation constant. For each increase in one inch of insulation, the estimated average use of heating oil is decreased by 20.012 gallons, holding temperature constant.

11 © 2002 Prentice-Hall, Inc. Chap 14-11 Venn Diagrams and Explanatory Power of Regression Oil Temp Variations in oil explained by temp or variations in temp used in explaining variation in oil Variations in oil explained by the error term Variations in temp not used in explaining variation in Oil

12 © 2002 Prentice-Hall, Inc. Chap 14-12 Venn Diagrams and Explanatory Power of Regression Oil Temp (continued)

13 © 2002 Prentice-Hall, Inc. Chap 14-13 Venn Diagrams and Explanatory Power of Regression Oil Temp Insulation Overlapping variation NOT estimation Overlapping variation in both Temp and Insulation are used in explaining the variation in Oil but NOT in the estimation of nor NOT Variation NOT explained by Temp nor Insulation

14 © 2002 Prentice-Hall, Inc. Chap 14-14 Coefficient of Multiple Determination Proportion of total variation in Y explained by all X variables taken together Never decreases when a new X variable is added to model Disadvantage when comparing models

15 © 2002 Prentice-Hall, Inc. Chap 14-15 Venn Diagrams and Explanatory Power of Regression Oil Temp Insulation

16 © 2002 Prentice-Hall, Inc. Chap 14-16 Adjusted Coefficient of Multiple Determination Proportion of variation in Y explained by all X variables adjusted for the number of X variables used Penalize excessive use of independent variables Smaller than Useful in comparing among models

17 © 2002 Prentice-Hall, Inc. Chap 14-17 Coefficient of Multiple Determination Excel Output Adjusted r 2  reflects the number of explanatory variables and sample size  is smaller than r 2

18 © 2002 Prentice-Hall, Inc. Chap 14-18 Interpretation of Coefficient of Multiple Determination 96.56% of the total variation in heating oil can be explained by difference in temperature and amount of insulation 95.99% of the total fluctuation in heating oil can be explained by difference in temperature and amount of insulation after adjusting for the number of explanatory variables and sample size

19 © 2002 Prentice-Hall, Inc. Chap 14-19 Using The Model to Make Predictions Predict the amount of heating oil used for a home if the average temperature is 30 0 and the insulation is six inches. The predicted heating oil used is 278.97 gallons

20 © 2002 Prentice-Hall, Inc. Chap 14-20 Residual Plots Residuals vs. May need to transform Y variable Residuals vs. May need to transform variable Residuals vs. May need to transform variable Residuals vs. time May have autocorrelation

21 © 2002 Prentice-Hall, Inc. Chap 14-21 Residual Plots: Example No discernable pattern May be some non- linear relationship

22 © 2002 Prentice-Hall, Inc. Chap 14-22 Influence Analysis To determine observations that have influential effect on the fitted model Potentially influential points become candidates for removal from the model Criteria used are The hat matrix elements h i The Studentized deleted residuals t i * Cook’s distance statistic D i All three criteria are complementary Only when all three criteria provide consistent results should an observation be removed

23 © 2002 Prentice-Hall, Inc. Chap 14-23 The Hat Matrix Element h i If, X i is an Influential Point X i may be considered a candidate for removal from the model

24 © 2002 Prentice-Hall, Inc. Chap 14-24 The Hat Matrix Element h i : Heating Oil Example  No h i > 0.4  No observation appears to be a candidate for removal from the model

25 © 2002 Prentice-Hall, Inc. Chap 14-25 The Studentized Deleted Residuals t i * : difference between the observed and predicted based on a model that includes all observations except observation i : standard error of the estimate for a model that includes all observations except observation i An observation is considered influential if is the critical value of a two-tail test at a alpha level of significance

26 © 2002 Prentice-Hall, Inc. Chap 14-26 The Studentized Deleted Residuals t i * :Example  t 10 * and t 13 * are influential points for potential removal from the model

27 © 2002 Prentice-Hall, Inc. Chap 14-27 Cook’s Distance Statistic D i is the Studentized residual If, an observation is considered influential is the critical value of the F distribution at a 50% level of significance

28 © 2002 Prentice-Hall, Inc. Chap 14-28 Cook’s Distance Statistic D i : Heating Oil Example  No D i > 0.835  No observation appears to be candidate for removal from the model Using the three criteria, there is insufficient evidence for the removal of any observation from the model

29 © 2002 Prentice-Hall, Inc. Chap 14-29 Testing for Overall Significance Show if there is a linear relationship between all of the X variables together and Y Use F test statistic Hypotheses: H 0 :      …  k = 0 (no linear relationship) H 1 : at least one  i  ( at least one independent variable affects Y ) The null hypothesis is a very strong statement Almost always reject the null hypothesis

30 © 2002 Prentice-Hall, Inc. Chap 14-30 Testing for Overall Significance Test statistic: where F has p numerator and (n-p-1) denominator degrees of freedom (continued)

31 © 2002 Prentice-Hall, Inc. Chap 14-31 Test for Overall Significance Excel Output: Example p = 2, the number of explanatory variables n - 1 p value

32 © 2002 Prentice-Hall, Inc. Chap 14-32 Test for Overall Significance Example Solution F 03.89 H 0 :  1 =  2 = … =  p = 0 H 1 : At least one  i  0  =.05 df = 2 and 12 Critical Value(s) : Test statistic: Decision: Conclusion: Reject at  = 0.05 There is evidence that at least one independent variable affects Y  = 0.05 F  168.47 (Excel Output)

33 © 2002 Prentice-Hall, Inc. Chap 14-33 Test for Significance: Individual Variables Show whether there is a linear relationship between the variable X i and Y Use t Test Statistic Hypotheses: H 0 :  i  0 (No linear relationship) H 1 :  i  0 (Linear relationship between X i and Y)

34 © 2002 Prentice-Hall, Inc. Chap 14-34 t Test Statistic Excel Output: Example t Test Statistic for X 1 (Temperature) t Test Statistic for X 2 (Insulation)

35 © 2002 Prentice-Hall, Inc. Chap 14-35 t Test : Example Solution H 0 :  1 = 0 H 1 :  1  0 df = 12 Critical Value(s): Test Statistic: Decision: Conclusion: Reject H 0 at  = 0.05 There is evidence of a significant effect of temperature on oil consumption. t 0 2.1788 -2.1788.025 Reject H 0 0.025 Does temperature have a significant effect on monthly consumption of heating oil? Test at  = 0.05. t Test Statistic = -16.1699

36 © 2002 Prentice-Hall, Inc. Chap 14-36 Venn Diagrams and Estimation of Regression Model Oil Temp Insulation Only this information is used in the estimation of This information is NOT used in the estimation of nor

37 © 2002 Prentice-Hall, Inc. Chap 14-37 Confidence Interval Estimate for the Slope Provide the 95% confidence interval for the population slope  1 (the effect of temperature on oil consumption). -6.169   1  -4.704 The estimated average consumption of oil is reduced by between 4.7 gallons to 6.17 gallons per each increase of 1 0 F.

38 © 2002 Prentice-Hall, Inc. Chap 14-38 Contribution of a Single Independent Variable Let X k be the independent variable of interest Measures the contribution of X k in explaining the total variation in Y (SST)

39 © 2002 Prentice-Hall, Inc. Chap 14-39 Contribution of a Single Independent Variable Measures the contribution of in explaining SST From ANOVA section of regression for

40 © 2002 Prentice-Hall, Inc. Chap 14-40 Coefficient of Partial Determination of Measures the proportion of variation in the dependent variable that is explained by X k while controlling for (holding constant) the other independent variables

41 © 2002 Prentice-Hall, Inc. Chap 14-41 Coefficient of Partial Determination for (continued) Example: Two Independent Variable Model

42 © 2002 Prentice-Hall, Inc. Chap 14-42 Venn Diagrams and Coefficient of Partial Determination for Oil Temp Insulation =

43 © 2002 Prentice-Hall, Inc. Chap 14-43 Contribution of a Subset of Independent Variables Let X s be the subset of independent variables of interest Measures the contribution of the subset x s in explaining SST

44 © 2002 Prentice-Hall, Inc. Chap 14-44 Contribution of a Subset of Independent Variables: Example Let X s be X 1 and X 3 From ANOVA section of regression for

45 © 2002 Prentice-Hall, Inc. Chap 14-45 Testing Portions of Model Examines the contribution of a subset X s of explanatory variables to the relationship with Y Null hypothesis: Variables in the subset do not significantly improve the model when all other variables are included Alternative hypothesis: At least one variable is significant

46 © 2002 Prentice-Hall, Inc. Chap 14-46 Testing Portions of Model Always one-tailed rejection region Requires comparison of two regressions One regression includes everything Another regression includes everything except the portion to be tested (continued)

47 © 2002 Prentice-Hall, Inc. Chap 14-47 Partial F Test For Contribution of Subset of X variables Hypotheses: H 0 : Variables X s do not significantly improve the model given all others variables included H 1 : Variables X s significantly improve the model given all others included Test Statistic: with df = m and (n-p-1) m = # of variables in the subset X s

48 © 2002 Prentice-Hall, Inc. Chap 14-48 Partial F Test For Contribution of A Single Hypotheses: H 0 : Variable X j does not significantly improve the model given all others included H 1 : Variable X j significantly improves the model given all others included Test Statistic: With df = 1 and (n-p-1) m = 1 here

49 © 2002 Prentice-Hall, Inc. Chap 14-49 Testing Portions of Model: Example Test at the  =.05 level to determine whether the variable of average temperature significantly improves the model given that insulation is included.

50 © 2002 Prentice-Hall, Inc. Chap 14-50 Testing Portions of Model: Example H 0 : X 1 (temperature) does not improve model with X 2 (insulation) included H 1 : X 1 does improve model  =.05, df = 1 and 12 Critical Value = 4.75 (For X 1 and X 2 )(For X 2 ) Conclusion: Reject H 0 ; X 1 does improve model

51 © 2002 Prentice-Hall, Inc. Chap 14-51 When to Use the F test The F test for the inclusion of a single variable after all other variables are included in the model is IDENTICAL to the t test of the slope for that variable The only reason to do an F test is to test several variables together

52 © 2002 Prentice-Hall, Inc. Chap 14-52 Chapter Summary Developed the multiple regression model Discussed residual plots Presented influence analysis Addressed testing the significance of the multiple regression model Discussed inferences on population regression coefficients Addressed testing portion of the multiple regression model

53 © 2002 Prentice-Hall, Inc. Chap 14-53 Multiple Linear Regression Data Model: Matrix Model:

54 © 2002 Prentice-Hall, Inc. Chap 14-54

55 © 2002 Prentice-Hall, Inc. Chap 14-55 Multiple Correlation Coefficient: Multiple Coefficient of Determination: may be interpreted as the proportion of variance explained by the regression of Y on X.

56 © 2002 Prentice-Hall, Inc. Chap 14-56 Theorem:

57 © 2002 Prentice-Hall, Inc. Chap 14-57

58 © 2002 Prentice-Hall, Inc. Chap 14-58 DATA; INPUT X1 X2 Y; CARDS; 68 60 75 49 94 63 60 91 57. 77 78 72 ; PROC PRINT; PROC REG; MODEL Y=X1 X2 / COVB CORRB R INFLUENCE; RUN;

59 © 2002 Prentice-Hall, Inc. Chap 14-59 Model: MODEL1 Dependent Variable: Y Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 1966.20840 983.10420 14.86 0.0002 Error 17 1124.79160 66.16421 Corrected Total 19 3091.00000 Root MSE 8.13414 R-Square 0.6361 Dependent Mean 74.50000 Adj R-Sq 0.5933 Coeff Var 10.91831 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 14.49614 14.20435 1.02 0.3218 X1 1 0.56319 0.11801 4.77 0.0002 X2 1 0.26736 0.15704 1.70 0.1069

60 © 2002 Prentice-Hall, Inc. Chap 14-60 Covariance of Estimates COVB Intercept X1 X2 Intercept 201.7635339 -0.635820247 -1.851491131 X1 -0.635820247 0.0139252459 -0.003440529 X2 -1.851491131 -0.003440529 0.0246625524 Correlation of Estimates COVB Intercept X1 X2 Intercept 1.0000 -0.3793 -0.8300 X1 -0.3793 1.0000 -0.1857 X2 -0.8300 -0.1857 1.0000

61 © 2002 Prentice-Hall, Inc. Chap 14-61 Dep Var Predicted Std Error Std Error Student Cook's Obs Y Value Predict Residual Residual Residual -2-1 0 1 2 D 1 75.0000 68.8346 4.2678 6.1654 6.925 0.890 | |* | 0.100 2 63.0000 67.2242 3.3214 -4.2242 7.425 -0.569 | *| | 0.022 3 57.0000 72.6172 2.2988 -15.6172 7.803 -2.002 | ****| | 0.116 4 88.0000 74.4491 1.9107 13.5509 7.907 1.714 | |*** | 0.057 5 88.0000 90.5143 4.2002 -2.5143 6.966 -0.361 | | | 0.016 6 79.0000 85.2747 2.6984 -6.2747 7.674 -0.818 | *| | 0.028 7 82.0000 67.5089 2.4898 14.4911 7.744 1.871 | |*** | 0.121 8 73.0000 66.4506 2.8567 6.5494 7.616 0.860 | |* | 0.035 9 90.0000 81.2755 2.5928 8.7245 7.710 1.132 | |** | 0.048 10 62.0000 59.7208 3.8097 2.2792 7.187 0.317 | | | 0.009 11 70.0000 77.4755 1.8990 -7.4755 7.909 -0.945 | *| | 0.017 12 96.0000 93.1309 3.8760 2.8691 7.151 0.401 | | | 0.016 13 76.0000 73.9825 2.5281 2.0175 7.731 0.261 | | | 0.002 14 75.0000 80.1776 2.3793 -5.1776 7.778 -0.666 | *| | 0.014 15 85.0000 84.6150 3.2590 0.3850 7.453 0.0517 | | | 0.000 16 40.0000 50.3917 5.9936 -10.3917 5.499 -1.890 | ***| | 1.414 17 74.0000 76.2637 2.1866 -2.2637 7.835 -0.289 | | | 0.002 18 70.0000 69.0846 2.0768 0.9154 7.865 0.116 | | | 0.000 19 75.0000 72.2929 2.6787 2.7071 7.680 0.352 | | | 0.005 20 72.0000 78.7158 2.5093 -6.7158 7.737 -0.868 | *| | 0.026 21. 83.7560 3.0157....

62 © 2002 Prentice-Hall, Inc. Chap 14-62 Hat Diag Obs Residual RStudent H 1 6.1654 0.8846 0.2753 2 -4.2242 -0.5572 0.1667 3 -15.6172 -2.2211 0.0799 4 13.5509 1.8281 0.0552 5 -2.5143 -0.3515 0.2666 6 -6.2747 -0.8094 0.1100 7 14.4911 2.0374 0.0937 8 6.5494 0.8530 0.1233 9 8.7245 1.1417 0.1016 10 2.2792 0.3086 0.2194 11 -7.4755 -0.9420 0.0545 12 2.8691 0.3911 0.2271 13 2.0175 0.2537 0.0966 14 -5.1776 -0.6543 0.0856 15 0.3850 0.0501 0.1605 16 -10.3917 -2.0627 0.5429 17 -2.2637 -0.2810 0.0723 18 0.9154 0.1130 0.0652 19 2.7071 0.3432 0.1084 20 -6.7158 -0.8613 0.0952

63 © 2002 Prentice-Hall, Inc. Chap 14-63 | 100 + | | o | o o o o o | o o o 90 + o | o H | O | o o M 80 + o o E | o W | O | o R | K 70 + | 60 + o | 50 + o | -+------------+------------+------------+------------+------------+------------+------------+ 30 40 50 60 70 80 90 100 MIDTERM

64 © 2002 Prentice-Hall, Inc. Chap 14-64

65 © 2002 Prentice-Hall, Inc. Chap 14-65 Goodness of Fit

66 © 2002 Prentice-Hall, Inc. Chap 14-66

67 © 2002 Prentice-Hall, Inc. Chap 14-67 Regression Effect

68 © 2002 Prentice-Hall, Inc. Chap 14-68 Goodness of Fit for using replicate observations


Download ppt "© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model."

Similar presentations


Ads by Google