Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.

Similar presentations


Presentation on theme: "Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition."— Presentation transcript:

1 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition

2 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-2 Learning Objectives In this chapter, you learn:  To use quadratic terms in a regression model  To use transformed variables in a regression model  To examine the effect of each observation on the regression model  To measure the correlation among the independent variables  To build a regression model using either the stepwise or best-subsets approach  To avoid the pitfalls involved in developing a multiple regression model

3 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-3  The relationship between the dependent variable and an independent variable may not be linear  Can review the scatter diagram to check for non-linear relationships  Example: Quadratic model  The second independent variable is the square of the first variable Nonlinear Relationships

4 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-4 Quadratic Regression Model  where: β 0 = Y intercept β 1 = regression coefficient for linear effect of X on Y β 2 = regression coefficient for quadratic effect on Y ε i = random error in Y for observation i Model form:

5 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-5 Linear fit does not give random residuals Linear vs. Nonlinear Fit Nonlinear fit gives random residuals X residuals X Y X Y X

6 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-6 Quadratic Regression Model Quadratic models may be considered when the scatter diagram takes on one of the following shapes: X1X1 Y X1X1 X1X1 YYY β 1 < 0β 1 > 0β 1 < 0β 1 > 0 β 1 = the coefficient of the linear term β 2 = the coefficient of the squared term X1X1 β 2 > 0 β 2 < 0

7 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-7 Testing the Overall Quadratic Model  Test for Overall Relationship H 0 : β 1 = β 2 = 0 (no overall relationship between X and Y) H 1 : β 1 and/or β 2 ≠ 0 (there is a relationship between X and Y)  F-test statistic =  Estimate the quadratic model to obtain the regression equation:

8 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-8 Testing for Significance: Quadratic Effect  Testing the Quadratic Effect  Compare quadratic regression equation with the linear regression equation

9 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-9 Testing for Significance: Quadratic Effect  Testing the Quadratic Effect  Consider the quadratic regression equation Hypotheses  (The quadratic term does not improve the model)  (The quadratic term improves the model) H 0 : β 2 = 0 H 1 : β 2  0 (continued)

10 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-10 Testing for Significance: Quadratic Effect  Testing the Quadratic Effect Hypotheses  (The quadratic term does not improve the model)  (The quadratic term improves the model)  The test statistic is H 0 : β 2 = 0 H 1 : β 2  0 (continued) where: b 2 = squared term slope coefficient β 2 = hypothesized slope (zero) S b = standard error of the slope 2

11 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-11 Testing for Significance: Quadratic Effect  Testing the Quadratic Effect Compare r 2 from simple regression to adjusted r 2 from the quadratic model  If adj. r 2 from the quadratic model is larger than the r 2 from the simple model, then the quadratic model is a better model (continued)

12 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-12 Example: Quadratic Model  Purity increases as filter time increases: Purity Filter Time 31 72 83 155 227 338 4010 5412 6713 7014 7815 8515 8716 9917

13 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-13 Example: Quadratic Model (continued) Regression Statistics R Square0.96888 Adjusted R Square0.96628 Standard Error6.15997  Simple regression results: Y = -11.283 + 5.985 Time Coefficients Standard Errort StatP-value Intercept-11.282673.46805-3.253320.00691 Time5.985200.3096619.328192.078E-10 FSignificance F 373.579042.0778E-10 ^ t statistic, F statistic, and r 2 are all high, but the residuals are not random:

14 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-14 Coefficients Standard Errort StatP-value Intercept1.538702.244650.685500.50722 Time1.564960.601792.600520.02467 Time-squared0.245160.032587.524061.165E-05 Regression Statistics R Square0.99494 Adjusted R Square0.99402 Standard Error2.59513 FSignificance F 1080.73302.368E-13  Quadratic regression results: Y = 1.539 + 1.565 Time + 0.245 (Time) 2 ^ Example: Quadratic Model (continued) The quadratic term is significant and improves the model: adj. r 2 is higher and S YX is lower, residuals are now random

15 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-15 Using Transformations in Regression Analysis Idea:  non-linear models can often be transformed to a linear form  Can be estimated by least squares if transformed  transform X or Y or both to get a better fit or to deal with violations of regression assumptions  Can be based on theory, logic or scatter diagrams

16 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-16 The Square Root Transformation  The square-root transformation  Used to  overcome violations of the constant variance assumption  fit a non-linear relationship

17 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-17  Shape of original relationship X The Square Root Transformation (continued) b 1 > 0 b 1 < 0 X Y Y Y Y  Relationship when transformed

18 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-18  Original multiplicative model The Log Transformation  Transformed multiplicative model The Multiplicative Model:  Original multiplicative model  Transformed exponential model The Exponential Model:

19 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-19 Interpretation of coefficients For the multiplicative model:  When both dependent and independent variables are logged:  The coefficient of the independent variable X k can be interpreted as : a 1 percent change in X k leads to an estimated b k percentage change in the average value of Y. Therefore b k is the elasticity of Y with respect to a change in X k.

20 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-20 Influence Analysis  Several methods can be used to measure the influence of individual observations:  The hat matrix elements h i  The Studentized deleted residuals t i  Cook’s distance statistic D i

21 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-21 The Hat Matrix Elements h i  The hat matrix diagonal element for observation i, denoted h i, reflects the possible influence of X i on the regression equation  If potentially influential observations are present, you may need to reevaluate the need to keep them in the model  Rule: if h i > 2(k + 1)/n, then X i is an influential observation and is a candidate for removal from the model

22 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-22 The Studentized Deleted Residuals t i  The Studentized deleted residual measures the difference of each Y i from the value predicted by a model that includes all observations except observation i  Expressed as a t statistic

23 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-23 The Studentized Deleted Residuals t i  The Studentized deleted residual: where e i = the residual for observation i k = number of independent variables SSE = error sum of squares h i = hat matrix diagonal element for observation i  If t i > t n-k-2 or t i < - t n-k-2 (using a two tail test at  = 0.10), then observation i is highly influential and a candidate for removal (continued)

24 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-24 Cook’s distance statistic D i  Cook’s distance statistic D i is based on both h i and the Studentized residual  Used to decide whether an observation flagged by either the h i or t i criterion is unduly affecting the model

25 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-25 Cook’s distance statistic D i  Cook’s D i statistic where e i = the residual for observation i k = number of independent variables SSE = error sum of squared h i = hat matrix diagonal element for observation i  If D i > F k+1, n-k-1 at  = 0.05, then the observation is highly influential on the regression equation and is a candidate for removal (continued)

26 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-26 Collinearity  Collinearity: High correlation exists among two or more independent variables  This means the correlated variables contribute redundant information to the multiple regression model

27 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-27 Collinearity  Including two highly correlated independent variables can adversely affect the regression results  No new information provided  Can lead to unstable coefficients (large standard error and low t-values)  Coefficient signs may not match prior expectations (continued)

28 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-28 Some Indications of Strong Collinearity  Incorrect signs on the coefficients  Large change in the value of a previous coefficient when a new variable is added to the model  A previously significant variable becomes non- significant when a new independent variable is added  The estimate of the standard deviation of the model increases when a variable is added to the model

29 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-29 Detecting Collinearity (Variance Inflationary Factor) VIF j is used to measure collinearity: If VIF j > 5, X j is highly correlated with the other independent variables where R 2 j is the coefficient of determination of variable X j with all other X variables

30 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-30 Example: Pie Sales Sales = b 0 + b 1 (Price) + b 2 (Advertising) Week Pie Sales Price ($) Advertising ($100s) 13505.503.3 24607.503.3 33508.003.0 44308.004.5 53506.803.0 63807.504.0 74304.503.0 84706.403.7 94507.003.5 104905.004.0 113407.203.5 123007.903.2 134405.904.0 144505.003.5 153007.002.7 Recall the multiple regression equation of chapter 13:

31 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-31 Detecting Collinearity in Excel using PHStat Output for the pie sales example:  Since there are only two explanatory variables, only one VIF is reported  VIF is < 5  There is no evidence of collinearity between Price and Advertising Regression Analysis Price and all other X Regression Statistics Multiple R0.030438 R Square0.000926 Adjusted R Square-0.075925 Standard Error1.21527 Observations15 VIF1.000927 PHStat / regression / multiple regression … Check the “variance inflationary factor (VIF)” box

32 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-32 Model Building  Goal is to develop a model with the best set of independent variables  Easier to interpret if unimportant variables are removed  Lower probability of collinearity  Stepwise regression procedure  Provide evaluation of alternative models as variables are added  Best-subset approach  Try all combinations and select the best using the highest adjusted r 2 and lowest standard error

33 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-33  Idea: develop the least squares regression equation in steps, adding one explanatory variable at a time and evaluating whether existing variables should remain or be removed  The coefficient of partial determination is the measure of the marginal contribution of each independent variable, given that other independent variables are in the model Stepwise Regression

34 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-34 Best Subsets Regression  Idea: estimate all possible regression equations using all possible combinations of independent variables  Choose the best fit by looking for the highest adjusted r 2 and lowest standard error Stepwise regression and best subsets regression can be performed using PHStat

35 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-35 Alternative Best Subsets Criterion  Calculate the value C p for each potential regression model  Consider models with C p values close to or below k + 1  k is the number of independent variables in the model under consideration

36 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-36 Alternative Best Subsets Criterion  The C p Statistic Where k = number of independent variables included in a particular regression model T = total number of parameters to be estimated in the full regression model = coefficient of multiple determination for model with k independent variables = coefficient of multiple determination for full model with all T estimated parameters (continued)

37 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-37 Steps in Model Building 1. Compile a listing of all independent variables under consideration 2. Estimate full model and check VIFs 3. Check if any VIFs > 5  If no VIF > 5, go to step 5  If one VIF > 5, remove this variable  If more than one, eliminate the variable with the highest VIF and go back to step 2 4. Perform best subsets regression with remaining variables …

38 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-38 Steps in Model Building 5. List all models with C p close to or less than (k + 1) 6. Choose the best model  Consider parsimony  Do extra variables make a significant contribution? 7. Perform complete analysis with chosen model, including residual analysis 8. Transform the model if necessary to deal with violations of linearity or other model assumptions 9. Use the model for prediction and inference (continued)

39 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-39 Model Building Flowchart Choose X 1,X 2,…X k Run regression to find VIFs Remove variable with highest VIF Any VIF>5? Run subsets regression to obtain “best” models in terms of C p Do complete analysis Add quadratic term and/or transform variables as indicated Perform predictions No More than one? Remove this X Yes No Yes

40 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-40 Pitfalls and Ethical Considerations  Understand that interpretation of the estimated regression coefficients are performed holding all other independent variables constant  Evaluate residual plots for each independent variable  Evaluate interaction terms To avoid pitfalls and address ethical considerations:

41 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-41 Additional Pitfalls and Ethical Considerations  Obtain VIFs for each independent variable before determining which variables should be included in the model  Examine several alternative models using best- subsets regression  Use other methods when the assumptions necessary for least-squares regression have been seriously violated To avoid pitfalls and address ethical considerations: (continued)

42 Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-42 Chapter Summary  Developed the quadratic regression model  Discussed using transformations in regression models  The multiplicative model  The exponential model  Described collinearity  Discussed model building  Stepwise regression  Best subsets  Addressed pitfalls in multiple regression and ethical considerations


Download ppt "Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition."

Similar presentations


Ads by Google