Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.

Similar presentations


Presentation on theme: "Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition."— Presentation transcript:

1 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition

2 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-2 Learning Objectives In this chapter, you learn:  How to develop a multiple regression model  How to interpret the regression coefficients  How to determine which independent variables to include in the regression model  How to determine which independent variables are more important in predicting a dependent variable  How to use categorical variables in a regression model  How to predict a categorical dependent variable using logistic regression

3 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-3 The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model with k Independent Variables: Y-intercept Population slopesRandom Error

4 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-4 Multiple Regression Equation The coefficients of the multiple regression model are estimated using sample data Estimated (or predicted) value of Y Estimated slope coefficients Multiple regression equation with k independent variables: Estimated intercept In this chapter we will always use Excel to obtain the regression slope coefficients and other regression summary measures.

5 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-5 Two variable model Y X1X1 X2X2 Slope for variable X 1 Slope for variable X 2 Multiple Regression Equation (continued)

6 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-6 Example: 2 Independent Variables  A distributor of frozen desert pies wants to evaluate factors thought to influence demand  Dependent variable: Pie sales (units per week)  Independent variables: Price (in $) Advertising ($100’s)  Data are collected for 15 weeks

7 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-7 Pie Sales Example Sales = b 0 + b 1 (Price) + b 2 (Advertising) Week Pie Sales Price ($) Advertising ($100s) 13505.503.3 24607.503.3 33508.003.0 44308.004.5 53506.803.0 63807.504.0 74304.503.0 84706.403.7 94507.003.5 104905.004.0 113407.203.5 123007.903.2 134405.904.0 144505.003.5 153007.002.7 Multiple regression equation:

8 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-8 Multiple Regression Output Regression Statistics Multiple R0.72213 R Square0.52148 Adjusted R Square0.44172 Standard Error47.46341 Observations15 ANOVA dfSSMSFSignificance F Regression229460.02714730.0136.538610.01201 Residual1227033.3062252.776 Total1456493.333 CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept306.52619114.253892.682850.0199357.58835555.46404 Price-24.9750910.83213-2.305650.03979-48.57626-1.37392 Advertising74.1309625.967322.854780.0144917.55303130.70888

9 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-9 The Multiple Regression Equation b 1 = -24.975: sales will decrease, on average, by 24.975 pies per week for each $1 increase in selling price, net of the effects of changes due to advertising b 2 = 74.131: sales will increase, on average, by 74.131 pies per week for each $100 increase in advertising, net of the effects of changes due to price where Sales is in number of pies per week Price is in $ Advertising is in $100’s.

10 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-10 Using The Equation to Make Predictions Predict sales for a week in which the selling price is $5.50 and advertising is $350: Predicted sales is 428.62 pies Note that Advertising is in $100’s, so $350 means that X 2 = 3.5

11 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-11 Predictions in Excel using PHStat  PHStat | regression | multiple regression … Check the “confidence and prediction interval estimates” box

12 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-12 Input values Predictions in PHStat (continued) Predicted Y value < Confidence interval for the mean Y value, given these X’s < Prediction interval for an individual Y value, given these X’s <

13 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-13 Coefficient of Multiple Determination  Reports the proportion of total variation in Y explained by all X variables taken together

14 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-14 Regression Statistics Multiple R0.72213 R Square0.52148 Adjusted R Square0.44172 Standard Error47.46341 Observations15 ANOVA dfSSMSFSignificance F Regression229460.02714730.0136.538610.01201 Residual1227033.3062252.776 Total1456493.333 CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept306.52619114.253892.682850.0199357.58835555.46404 Price-24.9750910.83213-2.305650.03979-48.57626-1.37392 Advertising74.1309625.967322.854780.0144917.55303130.70888 52.1% of the variation in pie sales is explained by the variation in price and advertising Multiple Coefficient of Determination (continued)

15 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-15 Adjusted r 2  r 2 never decreases when a new X variable is added to the model  This can be a disadvantage when comparing models  What is the net effect of adding a new variable?  We lose a degree of freedom when a new X variable is added  Did the new X variable add enough explanatory power to offset the loss of one degree of freedom?

16 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-16  Shows the proportion of variation in Y explained by all X variables adjusted for the number of X variables used (where n = sample size, k = number of independent variables)  Penalize excessive use of unimportant independent variables  Smaller than r 2  Useful in comparing among models Adjusted r 2 (continued)

17 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-17 Regression Statistics Multiple R0.72213 R Square0.52148 Adjusted R Square0.44172 Standard Error47.46341 Observations15 ANOVA dfSSMSFSignificance F Regression229460.02714730.0136.538610.01201 Residual1227033.3062252.776 Total1456493.333 CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept306.52619114.253892.682850.0199357.58835555.46404 Price-24.9750910.83213-2.305650.03979-48.57626-1.37392 Advertising74.1309625.967322.854780.0144917.55303130.70888 44.2% of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample size and number of independent variables (continued) Adjusted r 2

18 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-18 Is the Model Significant?  F Test for Overall Significance of the Model  Shows if there is a linear relationship between all of the X variables considered together and Y  Use F-test statistic  Hypotheses: H 0 : β 1 = β 2 = … = β k = 0 (no linear relationship) H 1 : at least one β i ≠ 0 (at least one independent variable affects Y)

19 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-19 F Test for Overall Significance  Test statistic: where F has (numerator) = k and (denominator) = (n – k - 1) degrees of freedom

20 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-20 Regression Statistics Multiple R0.72213 R Square0.52148 Adjusted R Square0.44172 Standard Error47.46341 Observations15 ANOVA dfSSMSFSignificance F Regression229460.02714730.0136.538610.01201 Residual1227033.3062252.776 Total1456493.333 CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept306.52619114.253892.682850.0199357.58835555.46404 Price-24.9750910.83213-2.305650.03979-48.57626-1.37392 Advertising74.1309625.967322.854780.0144917.55303130.70888 (continued) F Test for Overall Significance With 2 and 12 degrees of freedom P-value for the F Test

21 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-21 H 0 : β 1 = β 2 = 0 H 1 : β 1 and β 2 not both zero  =.05 df 1 = 2 df 2 = 12 Test Statistic: Decision: Conclusion: Since F test statistic is in the rejection region (p- value <.05), reject H 0 There is evidence that at least one independent variable affects Y 0  =.05 F.05 = 3.885 Reject H 0 Do not reject H 0 Critical Value: F  = 3.885 F Test for Overall Significance (continued) F

22 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-22 Two variable model Y X1X1 X2X2 YiYi Y i < x 2i x 1i The best fit equation, Y, is found by minimizing the sum of squared errors,  e 2 < Sample observation Residuals in Multiple Regression Residual = e i = (Y i – Y i ) <

23 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-23 Multiple Regression Assumptions Assumptions:  The errors are normally distributed  Errors have a constant variance  The model errors are independent e i = (Y i – Y i ) < Errors ( residuals ) from the regression model:

24 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-24 Residual Plots Used in Multiple Regression  These residual plots are used in multiple regression:  Residuals vs. Y i  Residuals vs. X 1i  Residuals vs. X 2i  Residuals vs. time (if time series data) < Use the residual plots to check for violations of regression assumptions

25 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-25 Are Individual Variables Significant?  Use t tests of individual variable slopes  Shows if there is a linear relationship between the variable X j and Y  Hypotheses:  H 0 : β j = 0 (no linear relationship)  H 1 : β j ≠ 0 (linear relationship does exist between X j and Y)

26 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-26 Are Individual Variables Significant? H 0 : β j = 0 (no linear relationship) H 1 : β j ≠ 0 (linear relationship does exist between x j and y) Test Statistic: ( df = n – k – 1) (continued)

27 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-27 Regression Statistics Multiple R0.72213 R Square0.52148 Adjusted R Square0.44172 Standard Error47.46341 Observations15 ANOVA dfSSMSFSignificance F Regression229460.02714730.0136.538610.01201 Residual1227033.3062252.776 Total1456493.333 CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept306.52619114.253892.682850.0199357.58835555.46404 Price-24.9750910.83213-2.305650.03979-48.57626-1.37392 Advertising74.1309625.967322.854780.0144917.55303130.70888 t-value for Price is t = -2.306, with p-value.0398 t-value for Advertising is t = 2.855, with p-value.0145 (continued) Are Individual Variables Significant?

28 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-28 d.f. = 15-2-1 = 12  =.05 t  /2 = 2.1788 Inferences about the Slope: t Test Example H 0 : β i = 0 H 1 : β i  0 The test statistic for each variable falls in the rejection region (p-values <.05) There is evidence that both Price and Advertising affect pie sales at  =.05 From Excel output: Reject H 0 for each variable CoefficientsStandard Errort StatP-value Price-24.9750910.83213-2.305650.03979 Advertising74.1309625.967322.854780.01449 Decision: Conclusion: Reject H 0  /2=.025 -t α/2 Do not reject H 0 0 t α/2  /2=.025 -2.17882.1788

29 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-29 Confidence Interval Estimate for the Slope Confidence interval for the population slope β j Example: Form a 95% confidence interval for the effect of changes in price (X 1 ) on pie sales: -24.975 ± (2.1788)(10.832) So the interval is (-48.576, -1.374) (This interval does not contain zero, so price has a significant effect on sales) CoefficientsStandard Error Intercept306.52619114.25389 Price-24.9750910.83213 Advertising74.1309625.96732 where t has (n – k – 1) d.f. Here, t has (15 – 2 – 1) = 12 d.f.

30 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-30 Confidence Interval Estimate for the Slope Confidence interval for the population slope β i Example: Excel output also reports these interval endpoints: Weekly sales are estimated to be reduced by between 1.37 to 48.58 pies for each increase of $1 in the selling price CoefficientsStandard Error…Lower 95%Upper 95% Intercept306.52619114.25389…57.58835555.46404 Price-24.9750910.83213…-48.57626-1.37392 Advertising74.1309625.96732…17.55303130.70888 (continued)

31 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-31  Contribution of a Single Independent Variable X j SSR(X j | all variables except X j ) = SSR (all variables) – SSR(all variables except X j )  Measures the contribution of X j in explaining the total variation in Y (SST) Testing Portions of the Multiple Regression Model

32 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-32 Measures the contribution of X 1 in explaining SST From ANOVA section of regression for Testing Portions of the Multiple Regression Model Contribution of a Single Independent Variable X j, assuming all other variables are already included (consider here a 3-variable model): SSR(X 1 | X 2 and X 3 ) = SSR (all variables) – SSR(X 2 and X 3 ) (continued)

33 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-33 The Partial F-Test Statistic  Consider the hypothesis test: H 0 : variable X j does not significantly improve the model after all other variables are included H 1 : variable X j significantly improves the model after all other variables are included  Test using the F-test statistic: (with 1 and n-k-1 d.f.)

34 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-34 Testing Portions of Model: Example Test at the  =.05 level to determine whether the price variable significantly improves the model given that advertising is included Example: Frozen desert pies

35 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-35 Testing Portions of Model: Example H 0 : X 1 (price) does not improve the model with X 2 (advertising) included H 1 : X 1 does improve model  =.05, df = 1 and 12 F critical Value = 4.75 (For X 1 and X 2 )(For X 2 only) ANOVA dfSSMS Regression229460.0268714730.01343 Residual1227033.306472252.775539 Total1456493.33333 ANOVA dfSS Regression117484.22249 Residual1339009.11085 Total1456493.33333 (continued)

36 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-36 Testing Portions of Model: Example Conclusion: Reject H 0 ; adding X 1 does improve model (continued) (For X 1 and X 2 )(For X 2 only) ANOVA dfSSMS Regression229460.0268714730.01343 Residual1227033.306472252.775539 Total1456493.33333 ANOVA dfSS Regression117484.22249 Residual1339009.11085 Total1456493.33333

37 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-37 Coefficient of Partial Determination for k variable model  Measures the proportion of variation in the dependent variable that is explained by X j while controlling for (holding constant) the other explanatory variables

38 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-38 Coefficient of Partial Determination in Excel  Coefficients of Partial Determination can be found using Excel:  PHStat | regression | multiple regression …  Check the “coefficient of partial determination” box

39 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-39 Using Dummy Variables  A dummy variable is a categorical explanatory variable with two levels:  yes or no, on or off, male or female  coded as 0 or 1  Regression intercepts are different if the variable is significant  Assumes equal slopes for other variables  If more than two levels, the number of dummy variables needed is (number of levels - 1)

40 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-40 Dummy-Variable Example (with 2 Levels) Let: Y = pie sales X 1 = price X 2 = holiday (X 2 = 1 if a holiday occurred during the week) (X 2 = 0 if there was no holiday that week)

41 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-41 Same slope Dummy-Variable Example (with 2 Levels) (continued) X 1 (Price) Y (sales) b 0 + b 2 b0b0 Holiday No Holiday Different intercept Holiday (X 2 = 1) No Holiday (X 2 = 0) If H 0 : β 2 = 0 is rejected, then “Holiday” has a significant effect on pie sales

42 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-42 Sales: number of pies sold per week Price: pie price in $ Holiday: Interpreting the Dummy Variable Coefficient (with 2 Levels) Example: 1 If a holiday occurred during the week 0 If no holiday occurred b 2 = 15: on average, sales were 15 pies greater in weeks with a holiday than in weeks without a holiday, given the same price

43 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-43 Dummy-Variable Models (more than 2 Levels)  The number of dummy variables is one less than the number of levels  Example: Y = house price ; X 1 = square feet  If style of the house is also thought to matter: Style = ranch, split level, condo Three levels, so two dummy variables are needed

44 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-44 Dummy-Variable Models (more than 2 Levels)  Example: Let “condo” be the default category, and let X 2 and X 3 be used for the other two categories: Y = house price X 1 = square feet X 2 = 1 if ranch, 0 otherwise X 3 = 1 if split level, 0 otherwise The multiple regression equation is: (continued)

45 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-45 Interpreting the Dummy Variable Coefficients (with 3 Levels) With the same square feet, a ranch will have an estimated average price of 23.53 thousand dollars more than a condo With the same square feet, a split-level will have an estimated average price of 18.84 thousand dollars more than a condo. Consider the regression equation: For a condo: X 2 = X 3 = 0 For a ranch: X 2 = 1; X 3 = 0 For a split level: X 2 = 0; X 3 = 1

46 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-46 Interaction Between Independent Variables  Hypothesizes interaction between pairs of X variables  Response to one X variable may vary at different levels of another X variable  Contains two-way cross product terms 

47 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-47 Effect of Interaction  Given:  Without interaction term, effect of X 1 on Y is measured by β 1  With interaction term, effect of X 1 on Y is measured by β 1 + β 3 X 2  Effect changes as X 2 changes

48 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-48 X 2 = 1: Y = 1 + 2X 1 + 3(1) + 4X 1 (1) = 4 + 6X 1 X 2 = 0: Y = 1 + 2X 1 + 3(0) + 4X 1 (0) = 1 + 2X 1 Interaction Example Slopes are different if the effect of X 1 on Y depends on X 2 value X1X1 4 8 12 0 010.51.5 Y = 1 + 2X 1 + 3X 2 + 4X 1 X 2 Suppose X 2 is a dummy variable and the estimated regression equation is

49 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-49 Significance of Interaction Term  Can perform a partial F test for the contribution of a variable to see if the addition of an interaction term improves the model  Multiple interaction terms can be included  Use a partial F test for the simultaneous contribution of multiple variables to the model

50 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-50 Simultaneous Contribution of Independent Variables  Use partial F test for the simultaneous contribution of multiple variables to the model  Let m variables be an additional set of variables added simultaneously  To test the hypothesis that the set of m variables improves the model: (where F has m and n-k-1 d.f.)

51 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-51 Logistic Regression  Used when the dependent variable Y is binary (i.e., Y takes on only two values)  Examples  Customer prefers Brand A or Brand B  Employee chooses to work full-time or part-time  Loan is delinquent or is not delinquent  Person voted in last election or did not  Logistic regression allows you to predict the probability of a particular categorical response

52 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-52 Logistic Regression  Logistic regression is based on the odds ratio, which represents the probability of a success compared with the probability of failure  The logistic regression model is based on the natural log of this odds ratio (continued)

53 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-53 Logistic Regression Where k = number of independent variables in the model ε i = random error in observation i Logistic Regression Model: Logistic Regression Equation: (continued)

54 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-54 Estimated Odds Ratio and Probability of Success  Once you have the logistic regression equation, compute the estimated odds ratio:  The estimated probability of success is

55 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 14-55 Chapter Summary  Developed the multiple regression model  Tested the significance of the multiple regression model  Discussed adjusted r 2  Discussed using residual plots to check model assumptions  Tested individual regression coefficients  Tested portions of the regression model  Used dummy variables  Evaluated interaction effects  Discussed logistic regression


Download ppt "Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition."

Similar presentations


Ads by Google