### Similar presentations

12-1

12-3 Multiple Regression 12.1The Linear Regression Model 12.2The Least Squares Estimates and Prediction 12.3The Mean Squared Error and the Standard Error 12.4Model Utility: R 2, Adjusted R 2, and the F Test 12.5Testing the Significance of an Independent Variable 12.6Confidence Intervals and Prediction Intervals 12.7Dummy Variables 12.8Model Building and the Effects of Multicollinearity 12.9 Residual Analysis in Multiple Regression

12-4 12.1 The Linear Regression Model The linear regression model relating y to x 1, x 2, …, x k is is the mean value of the dependent variable y when the values of the independent variables are x 1, x 2, …, x k.  are the regression parameters relating the mean value of y to x 1, x 2, …, x k.  is an error term that describes the effects on y of all factors other than the independent variables x 1, x 2, …, x k. where

12-5 Example: The Linear Regression Model Example 12.1: The Fuel Consumption Case

12-6 The Linear Regression Model Illustrated Example 12.1: The Fuel Consumption Case

12-7 The Regression Model Assumptions Assumptions about the model error terms,  ’s Mean Zero The mean of the error terms is equal to 0. Constant Variance The variance of the error terms   is, the same for every combination values of x 1, x 2, …, x k. Normality The error terms follow a normal distribution for every combination values of x 1, x 2, …, x k. Independence The values of the error terms are statistically independent of each other. Model

12-8 12.2 Least Squares Estimates and Prediction Estimation/Prediction Equation: b 1, b 2, …, b k are the least squares point estimates of the parameters  1,  2, …,  k. x 01, x 02, …, x 0k are specified values of the independent predictor variables x 1, x 2, …, x k. is the point estimate of the mean value of the dependent variable when the values of the independent variables are x 01, x 02, …, x 0k. It is also the point prediction of an individual value of the dependent variable when the values of the independent variables are x 01, x 02, …, x 0k.

12-9 Example: Least Squares Estimation Example 12.3: The Fuel Consumption Case Minitab Output FuelCons = 13.1 - 0.0900 Temp + 0.0825 Chill Predictor Coef StDev T P Constant 13.1087 0.8557 15.32 0.000 Temp -0.09001 0.01408 -6.39 0.001 Chill 0.08249 0.02200 3.75 0.013 S = 0.3671 R-Sq = 97.4% R-Sq(adj) = 96.3% Analysis of Variance Source DF SS MS F P Regression 2 24.875 12.438 92.30 0.000 Residual Error 5 0.674 0.135 Total 7 25.549 Predicted Values (Temp = 40, Chill = 10) Fit StDev Fit 95.0% CI 95.0% PI 10.333 0.170 ( 9.895, 10.771) ( 9.293, 11.374)

12-10 Example: Point Predictions and Residuals Example 12.3: The Fuel Consumption Case

12-11 12.3 Mean Square Error and Standard Error Mean Square Error, point estimate of residual variance   Standard Error, point estimate of residual standard deviation  Example 12.3 The Fuel Consumption Case Sum of Squared Errors Analysis of Variance Source DF SS MS F P Regression 2 24.875 12.438 92.30 0.000 Residual Error 5 0.674 0.135 Total 7 25.549

12-12 12.4 Model Utility: Multiple Coefficient of Determination, R² The multiple coefficient of determination R 2 is R 2 is the proportion of the total variation in y explained by the linear regression model

12-13 12.4 Model Utility: Adjusted R 2 The adjusted multiple coefficient of determination is Fuel Consumption Case: S = 0.3671 R-Sq = 97.4% R-Sq(adj) = 96.3% Analysis of Variance Source DF SS MS F P Regression 2 24.875 12.438 92.30 0.000 Residual Error 5 0.674 0.135 Total 7 25.549

12-14 12.4 Model Utility: F Test for Linear Regression Model To testH 0 :   =   = …=   = 0 versus H a : At least one of the  ,  , …,  k is not equal to 0 Test Statistic: Reject H 0 in favor of H a if: F(model) > F   or p-value <  F  is based on k numerator and n-(k+1) denominator degrees of freedom.

12-15 Example: F Test for Linear Regression Test Statistic: Example 12.5 The Fuel Consumption Case Minitab Output Reject H 0 at  level of significance, since F  is based on 2 numerator and 5 denominator degrees of freedom. F-test at  = 0.05 level of significance Analysis of Variance Source DF SS MS F P Regression 2 24.875 12.438 92.30 0.000 Residual Error 5 0.674 0.135 Total 7 25.549

12-16 12.5 Testing Significance of the Independent Variable Test Statistic If the regression assumptions hold, we can reject H 0 :  j = 0 at the  level of significance (probability of Type I error equal to  ) if and only if the appropriate rejection point condition holds or, equivalently, if the corresponding p-value is less than . t , t  /2 and p-values are based on n – (k+1) degrees of freedom. AlternativeReject H 0 if:p-Value 100(1-  )% Confidence Interval for  j

12-17 Example: Testing and Estimation for  s Example 12.6: The Fuel Consumption Case Minitab Output Predictor Coef StDev T P Constant 13.1087 0.8557 15.32 0.000 Temp -0.09001 0.01408 -6.39 0.001 Chill 0.08249 0.02200 3.75 0.013 t , t  /2 and p-values are based on 5 degrees of freedom. Chill is significant at the  = 0.05 level, but not at  = 0.01 Test Interval

12-18 12.6 Confidence and Prediction Intervals t  is based on n-(k+1) degrees of freedom Prediction: 100(1 -  )% confidence interval for the mean value of y If the regression assumptions hold, 100(1 -  )% prediction interval for an individual value of y (Distance value requires matrix algebra)

12-19 Example: Confidence and Prediction Intervals Example 12.9 The Fuel Consumption CaseMinitab Output FuelCons = 13.1 - 0.0900 Temp + 0.0825 Chill Predicted Values (Temp = 40, Chill = 10) Fit StDev Fit 95.0% CI 95.0% PI 10.333 0.170 (9.895, 10.771) (9.293,11.374) 95% Confidence Interval95% Prediction Interval

12-20 12.7Dummy Variables Example 12.11 The Electronics World Case Location Dummy Variable

12-21 Example: Regression with a Dummy Variable Example 12.11: The Electronics World Case Minitab Output Sales = 17.4 + 0.851 Households + 29.2 DM Predictor Coef StDev T P Constant 17.360 9.447 1.84 0.109 Househol 0.85105 0.06524 13.04 0.000 DM 29.216 5.594 5.22 0.001 S = 7.329 R-Sq = 98.3% R-Sq(adj) = 97.8% Analysis of Variance Source DF SS MS F P Regression 2 21412 10706 199.32 0.000 Residual Error 7 376 54 Total 9 21788

12-22 12.8 Model Building and the Effects of Multicollinearity Example: The Sale Territory Performance Case

12-23 Correlation Matrix Example: The Sale Territory Performance Case

12-24 Multicollinearity Multicollinearity refers to the condition where the independent variables (or predictors) in a model are dependent, related, or correlated with each other. Effects Hinders ability to use b j s, t statistics, and p-values to assess the relative importance of predictors. Does not hinder ability to predict the dependent (or response) variable. Detection Scatter Plot Matrix Correlation Matrix Variance Inflation Factors (VIF)

12-25 12.9 Residual Analysis in Multiple Regression For an observed value of y i, the residual is If the regression assumptions hold, the residuals should look like a random sample from a normal distribution with mean 0 and variance  2. Residual Plots Residuals versus each independent variable Residuals versus predicted y’s Residuals in time order (if the response is a time series) Histogram of residuals Normal plot of the residuals

12-26 Multiple Regression Summary: 12.1 The Linear Regression Model 12.2 The Least Squares Estimates and Prediction 12.3 The Mean Squared Error and the Standard Error 12.4 Model Utility: R2, Adjusted R2, and the F Test 12.5 Testing the Significance of an Independent Variable 12.6 Confidence Intervals and Prediction Intervals 12.7 Dummy Variables 12.8 Model Building and the Effects of Multicollinearity 12.9 Residual Analysis in Multiple Regression