Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building.

Similar presentations


Presentation on theme: "© 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building."— Presentation transcript:

1 © 2011 Pearson Education, Inc

2 Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building

3 © 2011 Pearson Education, Inc Content 11.1Multiple Regression Models Part I: First-Order Models with Quantitative Independent Variables 11.2Estimating and Making Inferences about the individual  Parameters 11.3Evaluating Overall Model Utility 11.4Using the Model for Estimation and Prediction

4 © 2011 Pearson Education, Inc Content Part II: Model Building in Multiple Regression 11.5Interaction Models 11.6Quadratic and Other Higher-Order Models 11.7Qualitative (Dummy) Variable Models 11.8Models with Both Quantitative and Qualitative Variables 11.9Comparing Nested Models

5 © 2011 Pearson Education, Inc Content 11.10Stepwise Regression Part III: Multiple Regression Diagnostics 11.11Residual Analysis: Checking the Regression Assumptions 11.12Some Pitfalls: Estimability, Multicollinearity, and Extrapolation

6 © 2011 Pearson Education, Inc Learning Objectives Introduce a multiple regression model as a means of relating a dependent variable y to two or more independent variables Present several different multiple regression models involving both quantitative and qualitative independent variables

7 © 2011 Pearson Education, Inc Learning Objectives Assess how well the multiple regression model fits the sample data Show how an analysis of the model’s residuals can aid in detecting violations of model assumptions and in identifying model modifications

8 © 2011 Pearson Education, Inc 11.1 Multiple Regression Models

9 © 2011 Pearson Education, Inc The General Multiple Regression Model where y is the dependent variable (response variable). x 1, x 2, …, x k are the independent variables (predictor variables). E(y) =  0 +  1 x 1 +  2 x 2 +…+  k x k is the deterministic portion of the model.  i determines the contribution of the independent variable x i. Note: The symbols x 1, x 2, …, x k may represent higher- order terms for quantitative predictors or terms that represent qualitative predictors.

10 © 2011 Pearson Education, Inc Analyzing a Multiple Regression Model Step 1Hypothesize the deterministic component of the model. This component relates the mean, E(y), to the independent variables x 1, x 2, …, x k. This involves the choice of the independent variables to be included in the model. Step 2Use the sample data to estimate the unknown model parameters  0,  1,  2, …,  k in the model.

11 © 2011 Pearson Education, Inc Analyzing a Multiple Regression Model Step 3Specify the probability distribution of the random error term, , and estimate the standard deviation of this distribution, . Step 4Check that the assumptions on  are satisfied and make model modifications if necessary.

12 © 2011 Pearson Education, Inc Analyzing a Multiple Regression Model Step 5Statistically evaluate the usefulness of the model. Step 6When satisfied that the model is useful, use it for prediction, estimation, and other purposes.

13 © 2011 Pearson Education, Inc Assumptions for Random Error  For any given set of values of x 1, x 2, …, x k, the random error  has a probability distribution with the following properties: 1. Mean equal to 0 2. Variance equal to  2 (constant) 3. Normal distribution 4. Random errors are independent (in a probabilistic sense).

14 © 2011 Pearson Education, Inc Part I: First Order Models with Quantitative Independent Variables

15 © 2011 Pearson Education, Inc 11.2 Estimating and Making Inferences about the  Parameter

16 © 2011 Pearson Education, Inc where x 1, x 2, …, x 5 are all quantitative variables that are not functions of other independent variables. Note:  i represents the slope of the line relating y to x i when all the other x’s are held fixed. First-Order Model in Five Quantitative Independent (Predictor) Variables

17 © 2011 Pearson Education, Inc Estimator of σ 2 for a Multiple Regression Model with k Independent Variables

18 © 2011 Pearson Education, Inc Interpretation of Estimated Coefficients 2. y-Intercept (  0 ) Average value of y when all x k ’s = 0 ^ ^ ^ 1.Slope (  k ) Estimated y changes by  k for each 1 unit increase in x k holding all other variables constant

19 © 2011 Pearson Education, Inc Interpretation of Estimated Coefficients In first-order models, the relationship between E(y) and one of the variables, holding the others constant, will be a straight line and we get parallel straight lines as the values of the other variables change.

20 © 2011 Pearson Education, Inc A 100(1 –  )% Confidence Interval for a  Parameter where t  /2 is based on n – (k + 1) degrees of freedom and n = Number of observations k + 1 = Number of  parameters in the model

21 © 2011 Pearson Education, Inc Test of an Individual Parameter Coefficient in the Multiple Regression Model One-Tailed Test H 0 :  i = 0 H a :  i 0) Rejection region: t t  when H a :  i > 0) where t  is based on n – (k + 1) degrees of freedom n = Number of observations k + 1 = Number of  parameters in the model

22 © 2011 Pearson Education, Inc Test of an Individual Parameter Coefficient in the Multiple Regression Model Two-Tailed Test H 0 :  i = 0 H a :  i ≠ 0 Rejection region: | t | > t  where t  /2 based on n – (k + 1) degrees of freedom n = Number of observations k + 1 = Number of  parameters in the model

23 © 2011 Pearson Education, Inc Conditions Required for Valid Inferences about the  Parameters For any given set of values of x 1, x 2, …, x k, the random error  has a probability distribution with the following properties: 1. Mean equal to 0 2. Variance equal to  2 3. Normal distribution 4. Random errors are independent (in a probabilistic sense).

24 © 2011 Pearson Education, Inc First–Order Multiple Regression Model Relationship between 1 dependent and 2 or more independent variables is a linear function Dependent (response) variable Independent (explanatory) variables Population slopes Population Y-intercept Random error

25 © 2011 Pearson Education, Inc 1 st Order Model Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.) and newspaper circulation (000) on the number of ad responses (00). Estimate the unknown parameters. You ’ ve collected the following data: (y) (x 1 ) (x 2 ) RespSizeCirc 112 488 131 357 264 4106

26 © 2011 Pearson Education, Inc Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 0.0640 0.2599 0.246 0.8214 ADSIZE 1 0.2049 0.0588 3.656 0.0399 CIRC 1 0.2805 0.0686 4.089 0.0264 Parameter Estimation Computer Output 22 ^ 00 ^ 11 ^

27 © 2011 Pearson Education, Inc Interpretation of Coefficients Solution^ 2.Slope (  2 ) Number of responses to ad is expected to increase by 28.05 for each 1 unit (1,000) increase in circulation holding ad size constant ^ 1.Slope (  1 ) Number of responses to ad is expected to increase by 20.49 for each 1 sq. in. increase in ad size holding circulation constant

28 © 2011 Pearson Education, Inc Calculating s 2 and s Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Find SSE, s 2, and s.

29 © 2011 Pearson Education, Inc Analysis of Variance Source DF SS MS F P Regression 2 9.249736 4.624868 55.44.0043 Residual Error 3.250264.083421 Total 5 9.5 Analysis of Variance Computer Output SSE S2S2

30 © 2011 Pearson Education, Inc 11.3 Evaluating Overall Model Utility

31 © 2011 Pearson Education, Inc Use Caution When Conducting t-tests on the  Parameters It is dangerous to conduct t-tests on the individual  parameters in a first-order linear model for the purpose of determining which independent variables are useful for predicting y and which are not. If you fail to reject H 0 :  i = 0, several conclusions are possible: 1. There is no relationship between y and x i. 2. A straight-line relationship between y and x exists (holding the other x’s in the model fixed), but a Type II error occurred.

32 © 2011 Pearson Education, Inc Use Caution When Conducting t-tests on the  Parameters 3. A relationship between y and x i (holding the other x’s in the model fixed) exists but is more complex than a straight-line relationship (e.g., a curvilinear relationship may be appropriate). The most you can say about a  parameter test is that there is either sufficient (if you reject H 0 :  i = 0) or insufficient (if you do not reject H 0 :  i = 0) evidence of a linear (straight-line) relationship between y and x i.

33 © 2011 Pearson Education, Inc The Multiple Coefficient of Determination, R 2 is defined as

34 © 2011 Pearson Education, Inc The Multiple Coefficient of Determination, R 2 Proportion of variation in y ‘ explained ’ by all x variables taken together Never decreases when new x variable is added to model —Only y values determine SS yy —Disadvantage when comparing models

35 © 2011 Pearson Education, Inc The Adjusted Multiple Coefficient of Determination Takes into account n and number of parameters (k increases, R a 2 decreases) Similar interpretation to R 2

36 © 2011 Pearson Education, Inc Estimation of R 2 and R a 2 Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Find R 2 and R a 2.

37 © 2011 Pearson Education, Inc Excel Computer Output Solution R2R2 Ra2Ra2

38 © 2011 Pearson Education, Inc Testing Global Usefulness of the Model: The analysis of Variance F-Test H 0 :  1 =  2 = … =  k = 0 (All model terms are unimportant for predicting y) H a : At least one  i ≠ 0 (At least one model term is useful for predicting y)

39 © 2011 Pearson Education, Inc Testing Global Usefulness of the Model: The analysis of Variance F-Test where n is the sample size and k is the number of terms in the model. Rejection region: F > F  with k numerator degrees of freedom and [n – (k + 1)] denominator degrees of freedom

40 © 2011 Pearson Education, Inc Recommendation for Checking the Utility of a Multiple Regression Model 1. First, conduct a test of overall model adequacy using the F-test–that is, test H 0 :  1 =  2 =…=  k = 0 If the model is deemed adequate (that is, if you reject H 0 ), then proceed to step 2. Otherwise, you should hypothesize and fit another model. The new model may include more independent variables or higher-order terms.

41 © 2011 Pearson Education, Inc Recommendation for Checking the Utility of a Multiple Regression Model 2. Conduct t-tests on those  parameters in which you are particularly interested (that is, the “most important”  ’s). These usually involve only the  ’s associated with higher-order terms (x 2, x 1 x 2, etc.). However, it is a safe practice to limit the number of  ’s that are tested. Conducting a series of t-tests leads to a high overall Type I error rate .

42 © 2011 Pearson Education, Inc Testing Overall Significance Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Conduct the global F–test of model usefulness. Use α =.05.

43 © 2011 Pearson Education, Inc Testing Overall Significance Solution H 0 : H a :  = 1 = 2 = Critical Value(s): F 09.55  =.05 β 1 = β 2 = 0 At least 1 not zero.05 2 3

44 © 2011 Pearson Education, Inc Testing Overall Significance Computer Output Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 2 9.2497 4.6249 55.440 0.0043 Error 3 0.2503 0.0834 C Total 5 9.5000 k n – (k + 1) MS(Error) MS(Model)

45 © 2011 Pearson Education, Inc Testing Overall Significance Solution H 0 : H a :  = 1 = 2 = Critical Value(s): Test Statistic: Decision: Conclusion: F 09.55  =.05 β 1 = β 2 = 0 At least 1 not zero.05 2 3 Reject at  =.05 There is evidence at least 1 of the coefficients is not zero

46 © 2011 Pearson Education, Inc Testing Overall Significance Computer Output Solution Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 2 9.2497 4.6249 55.440 0.0043 Error 3 0.2503 0.0834 C Total 5 9.5000 P-Value MS(Model) MS(Error)

47 © 2011 Pearson Education, Inc 11.4 Using the Model for Estimation and Prediction

48 © 2011 Pearson Education, Inc Example Estimation and Prediction A collector of antique grandfather clocks sold at auction knows that the price y received for the clocks increases linearly with the age x 1 of the clocks and the number of bidders x 2 and is modeled with the first-order equation: E(y) =  0 +  1 x 1 +  2 x 2 y = auction price of grandfather clock x 1 = age of clock x 2 = number of bidders

49 © 2011 Pearson Education, Inc Example Estimation and Prediction

50 © 2011 Pearson Education, Inc Example Estimation and Prediction a.Estimate the average auction price for all 150- year-old clocks sold at auctions with 10 bidders using a 95% confidence interval. Interpret the result. Here, the key words average and for all imply we want to estimate the mean of y, E(y). We want a 95% confidence interval for E(y) when x 1 = 150 years and x 2 = 10 bidders. A Minitab printout for this analysis is shown...

51 © 2011 Pearson Education, Inc Example Estimation and Prediction

52 © 2011 Pearson Education, Inc Example Estimation and Prediction

53 © 2011 Pearson Education, Inc Example Estimation and Prediction The confidence interval (highlighted under “95% CI”) is (1,381.4, 1,481.9). Thus, we are 95% confident that the mean auction price for all 150- year-old clocks sold at an auction with 10 bidders lies between $1,381.40 and $1,481.90.

54 © 2011 Pearson Education, Inc Example Estimation and Prediction b. Predict the auction price for a single 150-year old clock sold at an auction with10 bidders using a 95% prediction interval. Interpret the result. The key words predict and for a single imply that we want a 95% prediction interval for y when x 1 = 150 years and x 2 = 10 bidders. This interval (highlighted under “95% PI” on the Minitab printout) is (1,154.1, 1,709.3). We say, with 95% confidence, that the auction price for a single 150- year-old clock sold at an auction with 10 bidders falls between $1,154.10 and $1,709.30.

55 © 2011 Pearson Education, Inc Example Estimation and Prediction c. Suppose you want to predict the auction price for one clock that is 50 years old and has 2 bidders. How should you proceed? Now, we want to predict the auction price, y, for a single (one) grandfather clock when x 1 = 50 years and x 2 = 2 bidders. Consequently, we desire a 95% prediction interval for y. However, before we form this prediction interval, we should check to make sure that the selected values of the independent variables, x 1 = 50 and x 2 = 2, are both reasonable and within their respective sample ranges.

56 © 2011 Pearson Education, Inc Example Estimation and Prediction If you examine the sample data shown in Table 11.1 you will see that the range for age is 108 ≤ x 1 ≤ 194, and the range for number of bidders is 5 ≤ x 2 ≤ 15. Thus, both selected values fall well outside their respective ranges. Recall the Caution warning about the dangers of using the model to predict y for a value of an independent variable that is not within the range of the sample data. Doing so may lead to an unreliable prediction.

57 © 2011 Pearson Education, Inc Part II: Model Building in Multiple Regression

58 © 2011 Pearson Education, Inc 11.5 Interaction Models

59 © 2011 Pearson Education, Inc An Interaction Model Relating E(y) to Two Quantitative Independent Variables where (  1 +  3 x 2 ) represents the change in E(y) for every 1-unit increase in x 1, holding x 2 fixed (  2 +  3 x 1 ) represents the change in E(y) for every 1-unit increase in x 2, holding x 1 fixed

60 © 2011 Pearson Education, Inc An Interaction Model Relating E(y) to Two Quantitative Independent Variables A three-dimensional graph of an interaction model in two quantitative x’s. If we slice the twisted plane at a fixed value of x 2, we obtain a straight line relating E(y) to x 1 ; however, the slope of the line will change as we change the value of x 2. (trace)

61 © 2011 Pearson Education, Inc Interaction Model With 2 Independent Variables Hypothesizes interaction between pairs of x variables —Response to one x variable varies at different levels of another x variable Contains two-way cross product terms Can be combined with other models —Example: dummy-variable model

62 © 2011 Pearson Education, Inc Effect of Interaction Given: Without interaction term, effect of x 1 on y is measured by  1 With interaction term, effect of x 1 on y is measured by  1 +  3 x 2 —Effect increases as x 2 increases

63 © 2011 Pearson Education, Inc Interaction Model Relationships Effect (slope) of x 1 on E(y) depends on x 2 value E(y) x1x1 4 8 12 0 010.51.5 E(y) = 1 + 2x 1 + 3x 2 + 4x 1 x 2 E(y) = 1 + 2x 1 + 3(1) + 4x 1 (1) = 4 + 6x 1 E(y) = 1 + 2x 1 + 3(0) + 4x 1 (0) = 1 + 2x 1

64 © 2011 Pearson Education, Inc Interaction Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Conduct a test for interaction. Use α =.05.

65 © 2011 Pearson Education, Inc Excel Computer Output Solution Global F–test indicates at least one parameter is not zero P-Value F

66 © 2011 Pearson Education, Inc Interaction Test Solution H 0 : H a :   df  Critical Value(s): t 04.303–4.303.025 Reject H 0.025  3 = 0  3 ≠ 0.05 6 – 4 = 2

67 © 2011 Pearson Education, Inc Excel Computer Output Solution

68 © 2011 Pearson Education, Inc Interaction Test Solution H 0 : H a :   df  Critical Value(s): Test Statistic: Decision: Conclusion: t 04.303–4.303.025 Reject H 0.025  3 = 0  3 ≠ 0.05 6 – 4 = 2 Do no reject at  =.05 There is no evidence of interaction t = 1.8528

69 © 2011 Pearson Education, Inc 11.6 Quadratic and Other Higher-Order Models

70 © 2011 Pearson Education, Inc A Quadratic (Second-Order) Model in a Single Quantitative Independent Variable where  0 is the y-intercept of the curve.  1 is a shift parameter.  2 is the rate of curvature.

71 © 2011 Pearson Education, Inc y Second-Order Model Relationships yy y  2 > 0  2 < 0 x1x1 x1x1 x1x1 x1x1

72 © 2011 Pearson Education, Inc 2 nd Order Model Example The data shows the number of weeks employed and the number of errors made per day for a sample of assembly line workers. Find a 2 nd order model, conduct the global F–test, and test if β 2 ≠ 0. Use α =.05 for all tests. Errors (y) Weeks (x) 201 181 162 104 84 45 36 18 210 111 012 112

73 © 2011 Pearson Education, Inc Excel Computer Output Solution

74 © 2011 Pearson Education, Inc Overall Model Test Solution Global F–test indicates at least one parameter is not zero P-Value F

75 © 2011 Pearson Education, Inc β 2 Parameter Test Solution β 2 test indicates curvilinear relationship exists P-Value t

76 © 2011 Pearson Education, Inc A Complete Second-Order Model with Two Quantitative Independent Variables Comments on the Parameters  0 : y-intercept, the value of E(y) when x 1 = x 2 = 0  1,  2 changing  1 and  2 causes the surface to shift along the x 1 - and x 2 -axes  3 : controls the rotation of the surface  4,  5 signs and values of these parameters control the type of surface and the rates of curvature

77 © 2011 Pearson Education, Inc Second-Order Model Relationships y x2x2 x1x1  4 +  5 > 0 y  4 +  5 < 0 y  3 2 > 4  4  5 x2x2 x1x1 x2x2 x1x1

78 © 2011 Pearson Education, Inc 11.7 Qualitative (Dummy) Variable Models

79 © 2011 Pearson Education, Inc A Model Relating E(y) to a Qualitative Independent Variable with Two Levels where Interpretation of  ’ s:    =  B (mean for base level)    =  A –  B

80 © 2011 Pearson Education, Inc Dummy-Variable Model Involves categorical x variable with 2 levels —e.g., male-female; college-no college Variable levels coded 0 and 1 Number of dummy variables is 1 less than number of levels of variable May be combined with quantitative variable (1 st order or 2 nd order model)

81 © 2011 Pearson Education, Inc Interpreting Dummy- Variable Model Equation Given: y = Starting salary of college graduates x 1 = GPA 0 if Male 1 if Female Same slopes x 2 = Male ( x 2 = 0 ): Female ( x 2 = 1 ):

82 © 2011 Pearson Education, Inc Dummy-Variable Model Example Computer Output: Same slopes 0 if Male 1 if Female x 2 = Male ( x 2 = 0 ): Female ( x 2 = 1 ):

83 © 2011 Pearson Education, Inc Dummy-Variable Model Relationships y x1x1 0 0 Same Slopes  1 00  0 +  2 ^ ^ ^ ^ Female Male

84 © 2011 Pearson Education, Inc A Model Relating E(y) to One Qualitative Independent Variable with k Levels where x i is the dummy variable for level i + 1 and Then, for this system of coding    B  A  A =      C =     +      C  A  B =     +    D =     +      D  A

85 © 2011 Pearson Education, Inc 11.8 Models with Both Quantitative and Qualitative Variables

86 © 2011 Pearson Education, Inc Example Substitute the appropriate values of the dummy variables in the model to obtain the equations of the three response lines in the figure.

87 © 2011 Pearson Education, Inc Example The complete model that characterizes the three lines in the figure is where x 1 = advertising expenditure

88 © 2011 Pearson Education, Inc Example Examining the coding, you can see that x 2 = x 3 = 0 when the advertising medium is newspaper. Substituting these values into the expression for E(y), we obtain the newspaper medium line:

89 © 2011 Pearson Education, Inc Example Similarly, we substitute the appropriate values of x 2 and x 3 into the expression for E(y)to obtain the radio medium line (x 2 = 1, x 3 = 0): Slopey-intercept

90 © 2011 Pearson Education, Inc Example and the television medium line: (x 2 = 0, x 3 = 1): Slopey-intercept

91 © 2011 Pearson Education, Inc Example Why bother fitting a model that combines all three lines (model 3) into the same equation? The answer is that you need to use this procedure if you wish to use statistical tests to compare the three media lines. We need to be able to express a practical question about the lines in terms of a hypothesis that a set of parameters in the model equals 0. You could not do this if you were to perform three separate regression analyses and fit a line to each set of media data.

92 © 2011 Pearson Education, Inc 11.9 Comparing Nested Models

93 © 2011 Pearson Education, Inc Nested Models Two models are nested if one model contains all the terms of the second model and at least one additional term. The more complex of the two models is called the complete (or full) model, and the simpler of the two is called the reduced model.

94 © 2011 Pearson Education, Inc Comparing Nested Models Contains a subset of terms in the complete (full) model Tests the contribution of a set of x variables to the relationship with y Null hypothesis H 0 :  g+1 =... =  k = 0 —Variables in set do not improve significantly the model when all other variables are included Used in selecting x variables or models Part of most computer programs

95 © 2011 Pearson Education, Inc F-Test for Comparing Nested Models Reduced model: Complete model: H a : At least one of the b parameters under test is nonzero.

96 © 2011 Pearson Education, Inc F-Test for Comparing Nested Models Test statistic:

97 © 2011 Pearson Education, Inc F-Test for Comparing Nested Models where SSE R = Sum of squared errors for the reduced model SSE C = Sum of squared errors for the complete model MSE C = Mean square error (s 2 )for the complete model k – g = Number of  parameters specified in H 0 (i.e., number of  parameters tested)

98 © 2011 Pearson Education, Inc F-Test for Comparing Nested Models where k +1 = Number of  parameters in the complete model (including  0 ) n = total sample size Rejection region: F > F  Where F is based on v 1 = k – g numerator degrees of freedom and v 2 = n – (k + 1) denominator degrees of freedom.

99 © 2011 Pearson Education, Inc Parsimonious Models A parsimonious model is a general linear model with a small number of  parameters. In situations where two competing models have essentially the same predictive power (as determined by an F-test), choose the more parsimonious of the two.

100 © 2011 Pearson Education, Inc Guidelines for Selecting Preferred Model in a Nested Model F-Test Conclusion Reject H 0 Fail to reject H 0 Preferred Model Complete Model Reduced Model

101 © 2011 Pearson Education, Inc 11.10 Stepwise Regression

102 © 2011 Pearson Education, Inc Stepwise Regression The user first identifies the response, y, and the set of potentially important independent variables, x 1, x 2, …, x k, where k is generally large. The response and independent variables are then entered into the computer software, and the stepwise procedure begins.

103 © 2011 Pearson Education, Inc Stepwise Regression Step 1Software program fits all possible one- variable models of the form E(y) =  0 +  1 x 1 to the data, where x i is the ith independent variable, i = 1, 2, …, k. Test the null hypothesis H 0 :  1 = 0 against the alternative H a :  1 ≠0. The independent variable that produces the largest (absolute) t-value is declared the best one-variable predictor of y: x 1

104 © 2011 Pearson Education, Inc Stepwise Regression Step 2The stepwise program now begins to search through the remaining (k – 1) independent variables for the best two- variable model of the form E(y) =  0 +  1 x 1 +  2 x i Again the variable having the largest t value is retained: x 2

105 © 2011 Pearson Education, Inc Stepwise Regression Step 3The stepwise procedure now checks for a third independent variable to include in the model with x 1 and x 2 –that is, we seek the best model of the form E(y) =  0 +  1 x 1 +  2 x 2 +  3 x i Again the variable having the largest t value is retained: x 3

106 © 2011 Pearson Education, Inc Stepwise Regression The result of the stepwise procedure is a model containing only those terms with t-values that are significant at the specified  level. Thus, in most practical situations, only several of the large number of independent variables remain. We have very probably included some unimportant independent variables in the model (Type I errors, P(Reject H 0 | H 0 is true)) and eliminated some important ones (Type II errors, P(Do not reject H 0 | H a is true)).

107 © 2011 Pearson Education, Inc Stepwise Regression There is a second reason why we might not have arrived at a good model. When we choose the variables to be included in the stepwise regression, we may often omit higher-order terms (to keep the number of variables manageable). Consequently, we may have initially omitted several important terms from the model. Thus, we should recognize stepwise regression for what it is: an objective variable screening procedure. ( 客觀不表示萬能! )

108 © 2011 Pearson Education, Inc Part III: Multiple Regression Diagnostics

109 © 2011 Pearson Education, Inc 11.11 Residual Analysis: Checking the Regression Assumptions

110 © 2011 Pearson Education, Inc Regression Residual A regression residual,, is defined as the difference between an observed y value and its corresponding predicted value:

111 © 2011 Pearson Education, Inc Properties of Regression Residual 1.The mean of the residuals is equal to 0. This property follows from the fact that the sum of the differences between the observed y values and their least squares predicted values is equal to 0.

112 © 2011 Pearson Education, Inc Properties of Regression Residual 2.The standard deviation of the residuals is equal to the standard deviation of the fitted regression model, s. This property follows from the fact that the sum of the squared residuals is equal to SSE, which when divided by the error degrees of freedom is equal to the variance of the fitted regression model, s 2.

113 © 2011 Pearson Education, Inc Properties of Regression Residual 2.The square root of the variance is both the standard deviation of the residuals and the standard deviation of the regression model.

114 © 2011 Pearson Education, Inc Regression Outlier A regression outlier is a residual that is larger than 3s (in absolute value).

115 © 2011 Pearson Education, Inc Residual Analysis Graphical analysis of residuals —Plot estimated errors versus x i values —Plot histogram or stem-&-leaf of residuals Purposes —Examine functional form (linear v. non-linear model) —Evaluate violations of assumptions

116 © 2011 Pearson Education, Inc Residual Plot for Functional Form Add x 2 TermCorrect Specification x e ^ x e ^

117 © 2011 Pearson Education, Inc Residual Plot for Equal Variance Unequal VarianceCorrect Specification Fan-shaped. Standardized residuals used typically. y^ e ^ e ^

118 © 2011 Pearson Education, Inc Residual Plot for Independence x Not IndependentCorrect Specification Plots reflect sequence data were collected. e ^ x e ^

119 © 2011 Pearson Education, Inc Residual Analysis Computer Output Dep Var Predict Student Obs SALES Value Residual Residual -2-1-0 1 2 1 1.0000 0.6000 0.4000 1.044 | |** | 2 1.0000 1.3000 -0.3000 -0.592 | *| | 3 2.0000 2.0000 0 0.000 | | | 4 2.0000 2.7000 -0.7000 -1.382 | **| | 5 4.0000 3.4000 0.6000 1.567 | |*** | Plot of standardized (student) residuals

120 © 2011 Pearson Education, Inc Steps in a Residual Analysis 1.Check for a misspecified model by plotting the residuals against each of the quantitative independent variables. Analyze each plot, looking for a curvilinear trend. This shape signals the need for a quadratic term in the model. Try a second-order term in the variable against which the residuals are plotted.

121 © 2011 Pearson Education, Inc Steps in a Residual Analysis 2.Examine the residual plots for outliers. Draw lines on the residual plots at 2- and 3- standard-deviation distances below and above the 0 line. Examine residuals outside the 3- standard-deviation lines as potential outliers and check to see that no more than 5% of the residuals exceed the 2-standard-deviation lines. Determine whether each outlier can be explained as an error in data collection or transcription, corresponds to a member of a

122 © 2011 Pearson Education, Inc Steps in a Residual Analysis 2.population different from that of the remainder of the sample, or simply represents an unusual observation. If the observation is determined to be an error, fix it or remove it. Even if you cannot determine the cause, you may want to rerun the regression analysis without the observation to determine its effect on the analysis.

123 © 2011 Pearson Education, Inc Steps in a Residual Analysis 3.Check for nonnormal errors by plotting a frequency distribution of the residuals, using a stem-and-leaf display or a histogram or a qq-plot. Check to see if obvious departures from normality exist. Extreme skewness of the frequency distribution may be due to outliers (so remove it !) or could indicate the need for a transformation of the dependent variable. (Normalizing transformations are beyond the scope of this book, but you can find information in the references.)

124 © 2011 Pearson Education, Inc Steps in a Residual Analysis 4.Check for unequal error variances by plotting the residuals against the predicted values,. If you detect a cone-shaped pattern or some other pattern that indicates that the variance of  is not constant, refit the model using an appropriate variance-stabilizing transformation on y, such as ln(y). (Consult the references for other useful variance- stabilizing transformations.)

125 © 2011 Pearson Education, Inc 11.12 Some Pitfalls: Estimability, Multicollinearity, and Extrapolation

126 © 2011 Pearson Education, Inc Regression Pitfalls Parameter Estimability —Number of levels of observed x–values must be one more than order of the polynomial in x Multicollinearity —Two or more x–variables in the model are correlated Extrapolation —Predicting y–values outside sampled range Correlated Errors

127 © 2011 Pearson Education, Inc Multicollinearity High correlation between x variables Coefficients measure combined effect Leads to unstable coefficients depending on x variables in model (rounding errors) Always exists – matter of degree Example: using both age and height as explanatory variables in same model

128 © 2011 Pearson Education, Inc Detecting Multicollinearity Significant correlations between pairs of independent variables Nonsignificant t–tests for some of the individual  parameters when the F-test for overall model adequacy is significant Sign opposite from what is expected in the estimated  parameters

129 © 2011 Pearson Education, Inc Using the Correlation Coefficient r to Detect Multicollinearity Extreme multicollinearity: | r | ≥.8 Moderate multicollinearity:.2 ≤ | r | <.8 Low multicollinearity: | r | <.2

130 © 2011 Pearson Education, Inc Solutions to Some Problems Created by Multicollinearity in Regression 1.Drop one or more of the correlated independent variables from the model. One way to decide which variables to keep in the model is to employ stepwise regression.

131 © 2011 Pearson Education, Inc Solutions to Some Problems Created by Multicollinearity in Regression 2.If you decide to keep all the independent variables in the model, a. Avoid making inferences about the individual  parameters based on the t-tests. b. Restrict inferences about E(y) and future y values to values of the x’s that fall within the range of the sample data.

132 © 2011 Pearson Education, Inc y Interpolation x Extrapolation Sampled Range Extrapolation

133 © 2011 Pearson Education, Inc Key Ideas Multiple Regression Variables y = Dependent variable (quantitative) x 1, x 2,…, x k = Independent variables (quantitative or qualitative) First-Order Model in k Quantitative x ’ s Each  i represents the change in y for every 1- unit increase in x i, holding all other x’s fixed.

134 © 2011 Pearson Education, Inc Key Ideas Interaction Model in 2 Quantitative x ’ s (  1 +  3 x 2 ) represents the change in y for every 1-unit increase in x 1, for fixed value of x 2 (  2 +  3 x 1 ) represents the change in y for every 1-unit increase in x 2, for fixed value of x 1

135 © 2011 Pearson Education, Inc Key Ideas Quadratic Model in 1 Quantitative x  2 represents the rate of curvature in y for x  2 > 0 implies upward curvature  2 < 0 implies downward curvature

136 © 2011 Pearson Education, Inc Key Ideas Complete Second-Order Model in 2 Quantitative x ’ s  4 represents the rate of curvature in y for x 1, holding x 2 fixed  5 represents the rate of curvature in y for x 2, holding x 1 fixed

137 © 2011 Pearson Education, Inc Key Ideas Dummy Variable Model for k Qualitative x x 1 = {1 if level 1, 0 if not} x 2 = {1 if level 1, 0 if not} x k –1 = {1 if level 1, 0 if not}  0 = E(y) for level k (base level) =  k  1 =  1 –  k  2 =  2 –  k

138 © 2011 Pearson Education, Inc Key Ideas Complete Second-Order Model in 1 Quantitative x and 1 Qualitative x (Two Levels, A and B) x 2 = {1 if level A, 0 if level B}

139 © 2011 Pearson Education, Inc Key Ideas Adjusted Coefficient of Determination Cannot be “forced” to 1 by adding independent variables to the model. Interaction between x 1 and x 2 Implies that the relationship between y and one x depends on the other x. Parsimonious Model A model with a small number of  parameters.

140 © 2011 Pearson Education, Inc Key Ideas Recommendation for Assessing Model Adequacy 1.Conduct global F-test; if significant then: 2.Conduct t-tests on only the most important  ’s (interaction or squared terms) 3.Interpret value of 2s 4.Interpret value of

141 © 2011 Pearson Education, Inc Key Ideas Recommendation for Testing Individual  ’s 1.If curvature (x 2 ) deemed important, do not conduct test for first-order (x) term in the model. 2.If interaction (x 1 x 2 ) deemed important, do not conduct tests for first-order terms (x 1 and x 2 ) in the model.

142 © 2011 Pearson Education, Inc Key Ideas Extrapolation Occurs when you predict y for values of x’s that are outside of range of sample data. Nested Models Are models where one model (the complete model) contains all the terms of another model (the reduced model) plus at least one additional term.

143 © 2011 Pearson Education, Inc Key Ideas Multicollinearity Occurs when two or more x’s are correlated. Indicators of multicollinearity: 1.Highly correlated x’s 2.Significant, global F-test, but all t-tests nonsignificant 3.Signs on  ’s opposite from expected

144 © 2011 Pearson Education, Inc Key Ideas Problems with Using Stepwise Regression Model as the “Final” Model 1.Extremely large number of t-tests inflate overall probability of at least one Type I error. 2.No higher-order terms (interactions or squared terms) are included in the model.

145 © 2011 Pearson Education, Inc Key Ideas Analysis of Residuals 1.Detect misspecified model: plot residuals vs. quantitative x (look for trends, e.g., curvilinear trend) 2. Detect nonconstant error variance: plot residuals vs. (look for patterns, e.g., cone shape)

146 © 2011 Pearson Education, Inc Key Ideas Analysis of Residuals 3. Detect nonnormal errors: histogram, stem- leaf, or normal probability plot of residuals (look for strong departures from normality) 4. Identify outliers: residuals greater than 3s in absolute value (investigate outliers before deleting)


Download ppt "© 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building."

Similar presentations


Ads by Google