 Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.

Presentation on theme: "Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers."— Presentation transcript:

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers Using Microsoft ® Excel 4 th Edition

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-2 Chapter Goals After completing this chapter, you should be able to:  apply multiple regression analysis to business decision-making situations  analyze and interpret the computer output for a multiple regression model  perform residual analysis for the multiple regression model  test the significance of the independent variables in a multiple regression model

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-3 Chapter Goals After completing this chapter, you should be able to:  use a coefficient of partial determination to test portions of the multiple regression model  incorporate qualitative variables into the regression model by using dummy variables  use interaction terms in regression models (continued)

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-4 The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model with k Independent Variables: Y-intercept Population slopesRandom Error

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-5 Multiple Regression Equation The coefficients of the multiple regression model are estimated using sample data Estimated (or predicted) value of Y Estimated slope coefficients Multiple regression equation with k independent variables: Estimated intercept In this chapter we will always use Excel to obtain the regression slope coefficients and other regression summary measures.

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-6 Two variable model Y X1X1 X2X2 Slope for variable X 1 Slope for variable X 2 Multiple Regression Equation (continued)

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-7 Example: 2 Independent Variables  A distributor of frozen desert pies wants to evaluate factors thought to influence demand  Dependent variable: Pie sales (units per week)  Independent variables: Price (in \$) Advertising (\$100’s)  Data are collected for 15 weeks

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-8 Pie Sales Example Sales = b 0 + b 1 (Price) + b 2 (Advertising) Week Pie Sales Price (\$) Advertising (\$100s) 13505.503.3 24607.503.3 33508.003.0 44308.004.5 53506.803.0 63807.504.0 74304.503.0 84706.403.7 94507.003.5 104905.004.0 113407.203.5 123007.903.2 134405.904.0 144505.003.5 153007.002.7 Multiple regression equation:

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-9 Estimating a Multiple Linear Regression Equation  Excel will be used to generate the coefficients and measures of goodness of fit for multiple regression  Excel:  Tools / Data Analysis... / Regression  PHStat:  PHStat / Regression / Multiple Regression…

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-10 Multiple Regression Output Regression Statistics Multiple R0.72213 R Square0.52148 Adjusted R Square0.44172 Standard Error47.46341 Observations15 ANOVA dfSSMSFSignificance F Regression229460.02714730.0136.538610.01201 Residual1227033.3062252.776 Total1456493.333 CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept306.52619114.253892.682850.0199357.58835555.46404 Price-24.9750910.83213-2.305650.03979-48.57626-1.37392 Advertising74.1309625.967322.854780.0144917.55303130.70888

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-11 The Multiple Regression Equation b 1 = -24.975: sales will decrease, on average, by 24.975 pies per week for each \$1 increase in selling price, net of the effects of changes due to advertising b 2 = 74.131: sales will increase, on average, by 74.131 pies per week for each \$100 increase in advertising, net of the effects of changes due to price where Sales is in number of pies per week Price is in \$ Advertising is in \$100’s.

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-12 Using The Equation to Make Predictions Predict sales for a week in which the selling price is \$5.50 and advertising is \$350: Predicted sales is 428.62 pies Note that Advertising is in \$100’s, so \$350 means that X 2 = 3.5

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-13 Predictions in PHStat  PHStat | regression | multiple regression … Check the “confidence and prediction interval estimates” box

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-14 Input values Predictions in PHStat (continued) Predicted Y value < Confidence interval for the mean Y value, given these X’s < Prediction interval for an individual Y value, given these X’s <

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-15 Coefficient of Multiple Determination  Reports the proportion of total variation in Y explained by all X variables taken together

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-16 Regression Statistics Multiple R0.72213 R Square0.52148 Adjusted R Square0.44172 Standard Error47.46341 Observations15 ANOVA dfSSMSFSignificance F Regression229460.02714730.0136.538610.01201 Residual1227033.3062252.776 Total1456493.333 CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept306.52619114.253892.682850.0199357.58835555.46404 Price-24.9750910.83213-2.305650.03979-48.57626-1.37392 Advertising74.1309625.967322.854780.0144917.55303130.70888 52.1% of the variation in pie sales is explained by the variation in price and advertising Multiple Coefficient of Determination (continued)

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-17 Adjusted r 2  r 2 never decreases when a new X variable is added to the model  This can be a disadvantage when comparing models  What is the net effect of adding a new variable?  We lose a degree of freedom when a new X variable is added  Did the new X variable add enough explanatory power to offset the loss of one degree of freedom?

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-18  Shows the proportion of variation in Y explained by all X variables adjusted for the number of X variables used (where n = sample size, k = number of independent variables)  Penalize excessive use of unimportant independent variables  Smaller than r 2  Useful in comparing among models Adjusted r 2 (continued)

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-19 Regression Statistics Multiple R0.72213 R Square0.52148 Adjusted R Square0.44172 Standard Error47.46341 Observations15 ANOVA dfSSMSFSignificance F Regression229460.02714730.0136.538610.01201 Residual1227033.3062252.776 Total1456493.333 CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept306.52619114.253892.682850.0199357.58835555.46404 Price-24.9750910.83213-2.305650.03979-48.57626-1.37392 Advertising74.1309625.967322.854780.0144917.55303130.70888 44.2% of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample size and number of independent variables (continued) Adjusted r 2

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-20 Two variable model Y X1X1 X2X2 YiYi Y i < x 2i x 1i The best fit equation, Y, is found by minimizing the sum of squared errors,  e 2 < Sample observation Residuals in Multiple Regression Residual = e i = (Y i – Y i ) <

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-21 Multiple Regression Assumptions Assumptions:  The errors are normally distributed  Errors have a constant variance  The model errors are independent e i = (Y i – Y i ) < Errors ( residuals ) from the regression model:

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-22 Residual Plots Used in Multiple Regression  These residual plots are used in multiple regression:  Residuals vs. Y i  Residuals vs. X 1i  Residuals vs. X 2i  Residuals vs. time (if time series data) < Use the residual plots to check for violations of regression assumptions

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-23 Is the Model Significant?  F-Test for Overall Significance of the Model  Shows if there is a linear relationship between all of the X variables considered together and Y  Use F test statistic  Hypotheses: H 0 : β 1 = β 2 = … = β k = 0 (no linear relationship) H 1 : at least one β i ≠ 0 (at least one independent variable affects Y)

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-24 F-Test for Overall Significance  Test statistic: where F has (numerator) = k and (denominator) = (n – k - 1) degrees of freedom

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-25 Regression Statistics Multiple R0.72213 R Square0.52148 Adjusted R Square0.44172 Standard Error47.46341 Observations15 ANOVA dfSSMSFSignificance F Regression229460.02714730.0136.538610.01201 Residual1227033.3062252.776 Total1456493.333 CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept306.52619114.253892.682850.0199357.58835555.46404 Price-24.9750910.83213-2.305650.03979-48.57626-1.37392 Advertising74.1309625.967322.854780.0144917.55303130.70888 (continued) F-Test for Overall Significance With 2 and 12 degrees of freedom P-value for the F-Test

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-26 H 0 : β 1 = β 2 = 0 H 1 : β 1 and β 2 not both zero  =.05 df 1 = 2 df 2 = 12 Test Statistic: Decision: Conclusion: Since F test statistic is in the rejection region (p- value <.05), reject H 0 There is evidence that at least one independent variable affects Y 0  =.05 F.05 = 3.885 Reject H 0 Do not reject H 0 Critical Value: F  = 3.885 F-Test for Overall Significance (continued) F

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-27 Are Individual Variables Significant?  Use t-tests of individual variable slopes  Shows if there is a linear relationship between the variable X i and Y  Hypotheses:  H 0 : β i = 0 (no linear relationship)  H 1 : β i ≠ 0 (linear relationship does exist between X i and Y)

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-28 Are Individual Variables Significant? H 0 : β i = 0 (no linear relationship) H 1 : β i ≠ 0 (linear relationship does exist between x i and y) Test Statistic: ( df = n – k – 1) (continued)

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-29 Regression Statistics Multiple R0.72213 R Square0.52148 Adjusted R Square0.44172 Standard Error47.46341 Observations15 ANOVA dfSSMSFSignificance F Regression229460.02714730.0136.538610.01201 Residual1227033.3062252.776 Total1456493.333 CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept306.52619114.253892.682850.0199357.58835555.46404 Price-24.9750910.83213-2.305650.03979-48.57626-1.37392 Advertising74.1309625.967322.854780.0144917.55303130.70888 t-value for Price is t = -2.306, with p-value.0398 t-value for Advertising is t = 2.855, with p-value.0145 (continued) Are Individual Variables Significant?

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-30 d.f. = 15-2-1 = 12  =.05 t  /2 = 2.1788 Inferences about the Slope: t Test Example H 0 : β i = 0 H 1 : β i  0 The test statistic for each variable falls in the rejection region (p-values <.05) There is evidence that both Price and Advertising affect pie sales at  =.05 From Excel output: Reject H 0 for each variable CoefficientsStandard Errort StatP-value Price-24.9750910.83213-2.305650.03979 Advertising74.1309625.967322.854780.01449 Decision: Conclusion: Reject H 0  /2=.025 -t α/2 Do not reject H 0 0 t α/2  /2=.025 -2.17882.1788

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-31 Confidence Interval Estimate for the Slope Confidence interval for the population slope β i Example: Form a 95% confidence interval for the effect of changes in price (X 1 ) on pie sales: -24.975 ± (2.1788)(10.832) So the interval is (-48.576, -1.374) CoefficientsStandard Error Intercept306.52619114.25389 Price-24.9750910.83213 Advertising74.1309625.96732 where t has (n – k – 1) d.f. Here, t has (15 – 2 – 1) = 12 d.f.

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-32 Confidence Interval Estimate for the Slope Confidence interval for the population slope β i Example: Excel output also reports these interval endpoints: Weekly sales are estimated to be reduced by between 1.37 to 48.58 pies for each increase of \$1 in the selling price CoefficientsStandard Error…Lower 95%Upper 95% Intercept306.52619114.25389…57.58835555.46404 Price-24.9750910.83213…-48.57626-1.37392 Advertising74.1309625.96732…17.55303130.70888 (continued)

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-33  Contribution of a Single Independent Variable X j SSR(X j | all variables except X j ) = SSR (all variables) – SSR(all variables except X j )  Measures the contribution of X j in explaining the total variation in Y (SST) Testing Portions of the Multiple Regression Model

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-34 Measures the contribution of X 1 in explaining SST From ANOVA section of regression for Testing Portions of the Multiple Regression Model Contribution of a Single Independent Variable X j, assuming all other variables are already included (consider here a 3-variable model): SSR(X 1 | X 2 and X 3 ) = SSR (all variables) – SSR(X 2 and X 3 ) (continued)

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-35 The Partial F-Test Statistic  Consider the hypothesis test: H 0 : variable Xj does not significantly improve the model after all other variables are included H 1 : variable Xj significantly improves the model after all other variables are included  Test using the F-test statistic: (with 1 and n-k-1 d.f.)

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-36 Testing Portions of Model: Example Test at the  =.05 level to determine whether the price variable significantly improves the model given that advertising is included Example: Frozen desert pies

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-37 Testing Portions of Model: Example H 0 : X 1 (price) does not improve the model with X 2 (advertising) included H 1 : X 1 does improve model  =.05, df = 1 and 12 F critical Value = 4.75 (For X 1 and X 2 )(For X 2 only) ANOVA dfSSMS Regression229460.0268714730.01343 Residual1227033.306472252.775539 Total1456493.33333 ANOVA dfSS Regression117484.22249 Residual1339009.11085 Total1456493.33333 (continued)

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-38 Testing Portions of Model: Example Conclusion: Reject H 0 ; adding X 1 does improve model (continued) (For X 1 and X 2 )(For X 2 only) ANOVA dfSSMS Regression229460.0268714730.01343 Residual1227033.306472252.775539 Total1456493.33333 ANOVA dfSS Regression117484.22249 Residual1339009.11085 Total1456493.33333

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-39 Coefficient of Partial Determination for k variable model  Measures the proportion of variation in the dependent variable that is explained by X j while controlling for (holding constant) the other explanatory variables

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-40 Coefficient of Partial Determination in Excel  Coefficients of Partial Determination can be found using Excel:  PHStat | regression | multiple regression …  Check the “coefficient of partial determination” box

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-41 Using Dummy Variables  A dummy variable is a categorical explanatory variable with two levels:  yes or no, on or off, male or female  coded as 0 or 1  Regression intercepts are different if the variable is significant  Assumes equal slopes for other variables  If more than two levels, the number of dummy variables needed is (number of levels - 1)

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-42 Dummy-Variable Example (with 2 Levels) Let: Y = pie sales X 1 = price X 2 = holiday (X 2 = 1 if a holiday occurred during the week) (X 2 = 0 if there was no holiday that week)

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-43 Same slope Dummy-Variable Example (with 2 Levels) (continued) X 1 (Price) Y (sales) b 0 + b 2 b0b0 Holiday No Holiday Different intercept Holiday (X 2 = 1) No Holiday (X 2 = 0) If H 0 : β 2 = 0 is rejected, then “Holiday” has a significant effect on pie sales

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-44 Sales: number of pies sold per week Price: pie price in \$ Holiday: Interpreting the Dummy Variable Coefficient (with 2 Levels) Example: 1 If a holiday occurred during the week 0 If no holiday occurred b 2 = 15: on average, sales were 15 pies greater in weeks with a holiday than in weeks without a holiday, given the same price

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-45 Dummy-Variable Models (more than 2 Levels)  The number of dummy variables is one less than the number of levels  Example: Y = house price ; X 1 = square feet  If style of the house is also thought to matter: Style = ranch, split level, condo Three levels, so two dummy variables are needed

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-46 Dummy-Variable Models (more than 2 Levels)  Example: Let “condo” be the default category, and let X 2 and X 3 be used for the other two categories: Y = house price X 1 = square feet X 2 = 1 if ranch, 0 otherwise X 3 = 1 if split level, 0 otherwise The multiple regression equation is: (continued)

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-47 Interpreting the Dummy Variable Coefficients (with 3 Levels) With the same square feet, a ranch will have an estimated average price of 23.53 thousand dollars more than a condo With the same square feet, a split-level will have an estimated average price of 18.84 thousand dollars more than a condo. Consider the regression equation: For a condo: X 2 = X 3 = 0 For a ranch: X 2 = 1; X 3 = 0 For a split level: X 2 = 0; X 3 = 1

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-48 Interaction Between Explanatory Variables  Hypothesizes interaction between pairs of X variables  Response to one X variable may vary at different levels of another X variable  Contains two-way cross product terms 

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-49 Effect of Interaction  Given:  Without interaction term, effect of X 1 on Y is measured by β 1  With interaction term, effect of X 1 on Y is measured by β 1 + β 3 X 2  Effect changes as X 2 changes

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-50 X 2 = 1: Y = 1 + 2X 1 + 3(1) + 4X 1 (1) = 4 + 6X 1 X 2 = 0: Y = 1 + 2X 1 + 3(0) + 4X 1 (0) = 1 + 2X 1 Interaction Example Slopes are different if the effect of X 1 on Y depends on X 2 value X1X1 4 8 12 0 010.51.5 Y = 1 + 2X 1 + 3X 2 + 4X 1 X 2 Suppose X 2 is a dummy variable and the estimated regression equation is

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-51 Significance of Interaction Term  Can perform a partial F-test for the contribution of a variable to see if the addition of an interaction term improves the model  Multiple interaction terms can be included  Use a partial F-test for the simultaneous contribution of multiple variables to the model

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-52 Simultaneous Contribution of Explanatory Variables  Use partial F-test for the simultaneous contribution of multiple variables to the model  Let m variables be an additional set of variables added simultaneously  To test the hypothesis that the set of m variables improves the model: (where F has m and n-k-1 d.f.)

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-53 Chapter Summary  Developed the multiple regression model  Tested the significance of the multiple regression model  Discussed adjusted r 2  Discussed using residual plots to check model assumptions  Tested individual regression coefficients  Tested portions of the regression model  Used dummy variables  Evaluated interaction effects

Download ppt "Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers."

Similar presentations