## Presentation on theme: "Adapted by Peter Au, George Brown College McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited."— Presentation transcript:

Copyright © 2011 McGraw-Hill Ryerson Limited Part 1Part 1Basic Multiple Regression Part 2Part 2Using Squared and Interaction Terms Part 3Part 3Dummy Variables and Advanced Statistical Inferences (Optional) 12-2

Copyright © 2011 McGraw-Hill Ryerson Limited 12.1The Multiple Regression ModelThe Multiple Regression Model 12.2Model Assumptions and the Standard ErrorModel Assumptions and the Standard Error 12.3The Least Squares Estimates and Point Estimation and PredictionThe Least Squares Estimates and Point Estimation and Prediction 12.4R 2 and Adjusted R 2R 2 and Adjusted R 2 12.5The Overall F TestThe Overall F Test 12.6Testing the Significance of an Independent VariableTesting the Significance of an Independent Variable 12.7Confidence and Prediction IntervalsConfidence and Prediction Intervals 12-3

Copyright © 2011 McGraw-Hill Ryerson Limited 12.10Using Dummy Variables to Model Qualitative Independent VariablesUsing Dummy Variables to Model Qualitative Independent Variables 12.11The Partial F Test: Testing the Significance of a Portion of a Regression ModelThe Partial F Test: Testing the Significance of a Portion of a Regression Model 12-5

Copyright © 2011 McGraw-Hill Ryerson Limited Simple linear regression uses one independent variable to explain the dependent variable Some relationships are too complex to be described using a single independent variable Multiple regression models use two or more independent variables to describe the dependent variable This allows multiple regression models to handle more complex situations There is no limit to the number of independent variables a model can use Like simple regression, multiple regression has only one dependent variable 12-7

Copyright © 2011 McGraw-Hill Ryerson Limited The linear regression model relating y to x 1, x 2,…, x k is y =  y|x1,x2,…,xk +  =  0 +  1 x 1 +  2 x 2 + … +  k x k +  where  y|x1,x2,…,xk +  =  0 +  1 x 1 +  2 x 2 + … +  k x k is the mean value of the dependent variable y when the values of the independent variables are x 1, x 2,…, x k β 0, β 1, β 2, … β k are the regression parameters relating the mean value of y to x 1, x 2,…, x k ɛ is an error term that describes the effects on y of all factors other than the independent variables x 1, x 2,…, x k 12-8

Copyright © 2011 McGraw-Hill Ryerson Limited Consider the following data table that relates two independent variables x 1 and x 2 to the dependent variable y (table 12.1) 12-9

Copyright © 2011 McGraw-Hill Ryerson Limited The plot shows that y tends to decrease in a straight-line fashion as x 1 increases This suggests that if we wish to predict y on the basis of x 1 only, the simple linear regression model y = β 0 + β 1 x 1 + ɛ relates y to x 1 12-11

Copyright © 2011 McGraw-Hill Ryerson Limited This plot shows that y tends to increase in a straight-line fashion as x 2 increases This suggests that if we wish to predict y on the basis of x 2 only, the simple linear regression model y = β 0 + β 1 x 2 + ɛ 12-13

Copyright © 2011 McGraw-Hill Ryerson Limited The experimental region is defined to be the range of the combinations of the observed values of x 1 and x 2 12-14 L01

Copyright © 2011 McGraw-Hill Ryerson Limited The mean value of y when IV 1 (independent variable one) is x 1 and IV 2 is x 2 is μ y|x1, x2 (mu of y given x 1 and x 2 Consider the equation μ y|x1, x2 = β 0 + β 1 x 1 + β 2 x 2, which relates mean y values to x 1 and x2 This is a linear equation with two variables, geometrically this equation is the equation of a plane in three-dimensional space 12-15 L01

Copyright © 2011 McGraw-Hill Ryerson Limited We need to make certain assumptions about the error term ɛ At any given combination of values of x 1, x 2,..., x k, there is a population of error term values that could occur 12-17 L02

Copyright © 2011 McGraw-Hill Ryerson Limited The model is y =  y|x1,x2,…,xk +  =  0 +  1 x 1 +  2 x 2 + … +  k x k +  Assumptions for multiple regression are stated about the model error terms,  ’s 12-18 L02

Copyright © 2011 McGraw-Hill Ryerson Limited 1.Mean of Zero Assumption The mean of the error terms is equal to 0 2.Constant Variance Assumption The variance of the error terms  2 is, the same for every combination values of x 1, x 2,…, x k 3.Normality Assumption The error terms follow a normal distribution for every combination values of x 1, x 2,…, x k 4.Independence Assumption The values of the error terms are statistically independent of each other 12-19 L02

Copyright © 2011 McGraw-Hill Ryerson Limited This is the point estimate of the residual variance  2 This formula is slightly different from simple regression 12-21

Copyright © 2011 McGraw-Hill Ryerson Limited This is the point estimate of the residual standard deviation  MSE is from last slide This formula too is slightly different from simple regression n-(k+1) is the number of degrees of freedom associated with the SSE 12-22

Copyright © 2011 McGraw-Hill Ryerson Limited Using Table 12.6 Compute the SSE to be 12-23

Copyright © 2011 McGraw-Hill Ryerson Limited Estimation/prediction equation is the point estimate of the mean value of the dependent variable when the values of the independent variables are x 1, x 2,…, x k It is also the point prediction of an individual value of the dependent variable when the values of the independent variables are x 1, x 2,…, x k b 0, b 1, b 2,…, b k are the least squares point estimates of the parameters  0,  1,  2,…,  k x 01, x 02,…, x 0k are specified values of the independent predictor variables x 1, x 2,…, x k 12-24 L03

Copyright © 2011 McGraw-Hill Ryerson Limited A formula exists for computing the least squares model for multiple regression This formula is written using matrix algebra and is presented in Appendix F available on Connect In practice, the model can be easily computed using Excel, MegaStat or many other computer packages 12-25

Copyright © 2011 McGraw-Hill Ryerson Limited 1.Total variation is given by the formula 2.Explained variation is given by the formula 3.Unexplained variation is given by the formula 4.Total variation is the sum of explained and unexplained variation 5.R 2 is the ratio of explained variation to total variation 12-28

Copyright © 2011 McGraw-Hill Ryerson Limited The multiple coefficient of determination, R 2, is the proportion of the total variation in the n observed values of the dependent variable that is explained by the multiple regression model 12-29 L04

Copyright © 2011 McGraw-Hill Ryerson Limited The multiple correlation coefficient R is just the square root of R 2 With simple linear regression, r would take on the sign of b 1 There are multiple b i ’s in a multiple regression model For this reason, R is always positive To interpret the direction of the relationship between the x’s and y, you must look to the sign of the appropriate b i coefficient 12-30

Copyright © 2011 McGraw-Hill Ryerson Limited Adding an independent variable to multiple regression will always raise R 2 R 2 will rise slightly even if the new variable has no relationship to y The adjusted R 2 corrects for this tendency in R 2 As a result, it gives a better estimate of the importance of the independent variables The bar notation indicates adjusted R 2 12-31

Copyright © 2011 McGraw-Hill Ryerson Limited 12-32 Excel Multiple Regression Output from Table 12.1 Explained variation n n Total variation k k

Copyright © 2011 McGraw-Hill Ryerson Limited Hypothesis H 0 :  1 =  2 = …=  k = 0 versus H a : At least one of  1,  2,…,  k ≠ 0 Test Statistic Reject H 0 in favor of H a if: F(model) > F   or p-value <  * F  is based on k numerator and n-(k+1) denominator degrees of freedom 12-33

Copyright © 2011 McGraw-Hill Ryerson Limited Test Statistic F-test at  = 0.05 level of significance F  is based on 2 numerator and 5 denominator degrees of freedom Reject H 0 at  =0.05 level of significance 12-34

Copyright © 2011 McGraw-Hill Ryerson Limited The F test tells us that at least one independent variable is significant The natural question is which one(s)? That question will be addressed in the next section 12-35

Copyright © 2011 McGraw-Hill Ryerson Limited A variable in a multiple regression model is not likely to be useful unless there is a significant relationship between it and y Significance Test Hypothesis H 0 :  j = 0 versus H a :  j ≠ 0 12-36

Copyright © 2011 McGraw-Hill Ryerson Limited If the regression assumptions hold, we can reject H 0 :  j = 0 at the  level of significance (probability of Type I error equal to  ) if and only if the appropriate rejection point condition holds Or, equivalently, if the corresponding p-value is less than  12-37

Copyright © 2011 McGraw-Hill Ryerson Limited AlternativeReject H 0 Ifp Value H a : β j ≠ 0|t| > t α/2 * Twice area under t distribution right of |t| H a : β j > 0t > t α Area under t distribution right of t H a : β j < 0t < –t α Area under t distribution left of t * That is t > t α/2 or t < –t α/2 t α/2, tα, and all p values are based on n - (k + 1) degrees of freedom 12-38

Copyright © 2011 McGraw-Hill Ryerson Limited Test Statistic A 100(1-α)% confidence interval for β j is t , t  /2 and p-values are based on n – (k+1) degrees of freedom 12-39

Copyright © 2011 McGraw-Hill Ryerson Limited It is customary to test the significance of every independent variable in a regression model If we can reject H 0 :  j = 0 at the 0.05 level of significance, then we have strong evidence that the independent variable x j is significantly related to y If we can reject H 0 :  j = 0 at the 0.01 level of significance, we have very strong evidence that the independent variable x j is significantly related to y The smaller the significance level  at which H 0 can be rejected, the stronger is the evidence that x j is significantly related to y 12-40

Copyright © 2011 McGraw-Hill Ryerson Limited Whether the independent variable x j is significantly related to y in a particular regression model is dependent on what other independent variables are included in the model That is, changing independent variables can cause a significant variable to become insignificant or cause an insignificant variable to become significant This issue is addressed in a later section on multicollinearity 12-41

Copyright © 2011 McGraw-Hill Ryerson Limited A sales manager evaluates the performance of sales representatives by using a multiple regression model that predicts sales performance on the basis of five independent variables x 1 = number of months the representative has been employed by the company x 2 = sales of the company’s product and competing products in the sales territory (market potential) x 3 = dollar advertising expenditure in the territory x 4 = weighted average of the company’s market share in the territory for the previous four years x 5 = change in the company’s market share in the territory over the previous four years y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 4 + β 5 x 5 + ɛ 12-42

Copyright © 2011 McGraw-Hill Ryerson Limited Using MegaStat a regression model was computed using collected data The p values associated with Time, MktPoten, Adver, and MktShare are all less than 0.01, we have very strong evidence that these variables are significantly related to y and, thus, are important in this model The p value associated with Change is 0.0530, suggesting weaker evidence that this variable is important 12-43 S bj

Copyright © 2011 McGraw-Hill Ryerson Limited The point on the regression line corresponding to a particular value of x 01, x 02,…, x 0k, of the independent variables is It is unlikely that this value will equal the mean value of y for these x values Therefore, we need to place bounds on how far the predicted value might be from the actual value We can do this by calculating a confidence interval for the mean value of y and a prediction interval for an individual value of y 12-44 L06

Copyright © 2011 McGraw-Hill Ryerson Limited Both the confidence interval for the mean value of y and the prediction interval for an individual value of y employ a quantity called the distance value With simple regression, we were able to calculate the distance value fairly easily However, for multiple regression, calculating the distance value requires matrix algebra See Appendix F on Connect for more details 12-45 L06

Copyright © 2011 McGraw-Hill Ryerson Limited Assume that the regression assumptions hold The formula for a 100(1-  ) confidence interval for the mean value of y is as follows: This is based on n-(k+1) degrees of freedom 12-46 L06

Copyright © 2011 McGraw-Hill Ryerson Limited Assume that the regression assumptions hold The formula for a 100(1-  ) prediction interval for an individual value of y is as follows: This is based on n-(k+1) degrees of freedom 12-47

Copyright © 2011 McGraw-Hill Ryerson Limited Using The Sales Territory Performance Case The point prediction of the sales corresponding to; TIME = 85.42 MktPoten = 35182.73 Adver = 7281.65 Mothered = 9.64 Change = 0.28 Using the regression model from before; ŷ = -1,113.7879 + 3.6121(85.42) + 0.0421(35,182.73) + 0.1289(7,281.65) + 256.9555(9.64) + 324.5334(0.28) = 4,181.74 (that is, 418,174 units) This point prediction is given at the bottom of the MegaStat output in Figure 12.7, which we repeat here: 12-49

Copyright © 2011 McGraw-Hill Ryerson Limited 95% Confidence Interval 95% Prediction Interval 12-51 L06

Copyright © 2011 McGraw-Hill Ryerson Limited One useful form of linear regression is the quadratic regression model Assume that we have n observations of x and y The quadratic regression model relating y to x is y =  0 +  1 x +  2 x 2 +  where  0 +  1 x +  2 x 2 is the mean value of the dependent variable y when the value of the independent variable is x  0,  1, and  2 are unknown regression parameters relating the mean value of y to x  is an error term that describes the effects on y of all factors other than x and x 2 12-53 Table of ContentsNext SectionNext Part

Copyright © 2011 McGraw-Hill Ryerson Limited Even though the quadratic model employs the squared term x 2 and, as a result, assumes a curved relationship between the mean value of y and x, this model is a linear regression model This is because  0 +  1 x +  2 x 2 expresses the mean value y as a linear function of the parameters  0,  1, and  2 As long as the mean value of y is a linear function of the regression parameters, we have a linear regression model 12-55

Copyright © 2011 McGraw-Hill Ryerson Limited The human resources department administers a stress questionnaire to 15 employees in which people rate their stress level on a 0 (no stress) to 4 (high stress) scale Work performance was measured as the average number of projects completed by the employee per year, averaged over the last five years 12-56

Copyright © 2011 McGraw-Hill Ryerson Limited We have only looked at the simple case where we have y and x That gave us the following quadratic regression model y =  0 +  1 x +  2 x 2 +  However, we are not limited to just two terms The following would also be a valid quadratic regression model y =  0 +  1 x 1 +  2 x 1 2 +  3 x 2 +  4 x 3 +  12-59

Copyright © 2011 McGraw-Hill Ryerson Limited Multiple regression models often contain interaction variables These are variables that are formed by multiplying two independent variables together For example, x 1 ·x 2 In this case, the x 1 ·x 2 variable would appear in the model along with both x 1 and x 2 We use interaction variables when the relationship between the mean value of y and one of the independent variables is dependent on the value of another independent variable 12-60 Table of ContentsNext SectionNext Part

Copyright © 2011 McGraw-Hill Ryerson Limited Consider a company that runs both radio and television ads for its products It is reasonable to assume that raising either ad amount would raise sales However, it is also reasonable to assume that the effectiveness of television ads depends, in part, on how often consumers hear the radio ads Thus, an interaction variable would be appropriate 12-61

Copyright © 2011 McGraw-Hill Ryerson Limited These last two figures imply that the more is spent on one type of advertising, the smaller the slope for the other type of advertising The is, the slope of one line depends on the value on the other variable That says that there is interaction between x 1 and x 2 12-65

Copyright © 2011 McGraw-Hill Ryerson Limited It is fairly easy to construct data plots to check for interaction when a careful experiment is carried out It is often not possible to construct the necessary plots with less structured data If an interaction is suspected, we can include the interactive term and see if it is significant 12-67

Copyright © 2011 McGraw-Hill Ryerson Limited When an interaction term (say x 1 x 2 ) is important to a model, it is the usual practice to leave the corresponding linear terms (x 1 and x 2 ) in the model no matter what their p-values 12-68

Copyright © 2011 McGraw-Hill Ryerson Limited So far, we have only looked at including quantitative data in a regression model However, we may wish to include descriptive qualitative data as well For example, might want to include the sex of respondents We can model the effects of different levels of a qualitative variable by using what are called dummy variables Also known as indicator variables 12-70

Copyright © 2011 McGraw-Hill Ryerson Limited A dummy variable always has a value of either 0 or 1 For example, to model sales at two locations, would code the first location as a zero and the second as a 1 Operationally, it does not matter which is coded 0 and which is coded 1 12-71

Copyright © 2011 McGraw-Hill Ryerson Limited Suppose that Electronics World, a chain of stores that sells audio and video equipment, has gathered the data in Table 12.13 These data concern store sales volume in July of last year (y, measured in thousands of dollars), the number of households in the store’s area (x, measured in thousands), and the location of the store 12-72

Copyright © 2011 McGraw-Hill Ryerson Limited Consider having three categories, say A, B, and C Cannot code this using one dummy variable A=0, B=1, and C=2 would be invalid Assumes the difference between A and B is the same as B and C We must use multiple dummy variables Specifically, a categories requires a-1 dummy variables For A, B, and C, would need two dummy variables x 1 is 1 for A, zero otherwise x 2 is 1 for B, zero otherwise If x 1 and x 2 are zero, must be C This is why the third dummy variable is not needed 12-76

Copyright © 2011 McGraw-Hill Ryerson Limited Geometrical Interpretation of the Sales Volume Model y = β 0 1 β 1 x + β 2 D M + β 3 xD M + ɛ 12-77

Copyright © 2011 McGraw-Hill Ryerson Limited So far, have only considered dummy variables as stand- alone variables Model so far is y =  0 +  1 x +  2 D +  where D is dummy variable However, can also look at interaction between dummy variable and other variables That model would take the for y =  0 +  1 x +  2 D +  3 xD+  With an interaction term, both the intercept and slope are shifted 12-79

Copyright © 2011 McGraw-Hill Ryerson Limited So far, we have seen dummy variables used to code categorical variables Dummy variables can also be used to flag unusual events that have an important impact on the dependent variable These unusual events can be one-time events Impact of a strike on sales Impact of major sporting event coming to town Or they can be reoccurring events Hot temperatures on soft drink sales Cold temperatures on coat sales 12-80

Copyright © 2011 McGraw-Hill Ryerson Limited So far, we have looked at testing single slope coefficients using t test We have also looked at testing all the coefficients at once using F test The partial F test allows us to test the significance of any set of independent variables in a regression model 12-81

Copyright © 2011 McGraw-Hill Ryerson Limited We can use this F test to test the significance of a portion of a regression mode 12-82

Copyright © 2011 McGraw-Hill Ryerson Limited The model: y =  0 +  1 x +  2 D M +  3 D D +  D M and D D are dummy variables This called the complete model Will now look at just the reduced model: y =  0 +  1 x +  Hypothesis to test H 0 :  2 =  3 = 0 verus H a : At least one of  2 and  3 does not equal zero The SSE for the complete model is SSE C = 443.4650 The SSE for the reduced model is SSE R = 2,467.8067 12-83

Copyright © 2011 McGraw-Hill Ryerson Limited We compare F with F.01 = 7.21F.01 = 7.21 Based on k – g = 2 numerator degrees of freedom And n – (k + 1) = 11 denominator degrees of freedom Note that k – g denotes the number of regression parameters set to 0 Since F = 25.1066 > 7.21 we reject the null hypothesis at  = 0.01 We conclude that it appears as though at least two locations have different effects on mean sales volume 12-84 L05

Copyright © 2011 McGraw-Hill Ryerson Limited The multiple regression model employs at least 2 independent variables to relate to the dependent variable Some ways to judge a models overall utility are; standard error, multiple coefficient of determination, adjusted multiple coefficient of determination, and the overall F test Square terms can be used to model quadric relationships while cross product terms can be used to model interaction relationships Dummy variables can use used to model qualitative independent variables The partial F test can be used to evaluate a portion of the regression model 12-85

Copyright © 2011 McGraw-Hill Ryerson Limited 12-86 Return Numerator df =2 Denominator df = 11 7.21