Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y = 

Similar presentations


Presentation on theme: "Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y = "— Presentation transcript:

1 Chapter 15 Multiple Regression

2 Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =  0 +  1 x 1 +  2 x 2 + … +  p x p Estimated Multiple Regression Equation

3 Car Data MPGWeightYearCylinders 183504708 153693708 183436708 163433708 173449708 154341708 144354708 144312708 144425708 153850708............ Continuing on for 397 observations

4 Multiple Regression, Example CoefficientsStandard Errort Stat Intercept46.30.80057.8 Weight-0.007650.000259-29.4 R Square0.687 CoefficientsStandard Errort Stat Intercept-14.73.96-3.71 Weight-0.006650.000214-31.0 Year0.7630.049015.5 R Square0.807

5 Multiple Regression, Example CoefficientsStandard Errort Stat Intercept-14.44.03-3.58 Weight-0.006520.000460-14.1 Year0.7600.049815.2 Cylinders-0.07410.232-0.319 R Square0.807 Predicted MPG for car weighing 4000 lbs built in 1980 with 6 cylinders: -14.4 -.00652(4000)+.76(80)-.0741(6) =-14.4-26.08+60.8-.4446=19.88

6 SST = SSR + SSE Sums of Squares

7 Multiple Coefficient of Determination The share of the variation explained by the estimated model. R 2 = SSR/SST Multiple Correlation Coefficient The correlation coefficient of the actual and predicted values

8 Adjusted Multiple Coefficient of Determination Regression Statistics Multiple R0.898 R Square0.807 Adjusted R Square0.805 Standard Error3.44 Observations397

9 F Test for Overall Significance H 0 :  1 =  2 =... =  p = 0 H a : One or more of the parameters is not equal to zero Reject H 0 if: F > F  Or Reject H 0 if: p-value <  F = MSR/MSE

10 ANOVA Table for Multiple Regression Model SourceSum of Squares Degrees of Freedom Mean SquaresF RegressionSSRpMSR = SSR/pF=MSR/MSE ErrorSSEn-p-1MSE = SSE/(n-p-1) TotalSSTn-1

11 ANOVA Example ANOVA dfSSMSF Significance F Regression31938264605476.42E-140 Residual393463811.8 Total39624021

12 t Test for Coefficients H 0 :  1 = 0 H a :  1 ≠ 0 Reject H 0 if: t t  Or if: p <  t = b 1 /s b1 With a t distribution of n-p-1 df

13 t Test Example CoefficientsStandard Errort StatP-value Intercept-14.484.038-3.5870.0003769 Weight-0.0065250.0004603-14.183.892E-37 Year0.76080.0498515.261.258E-41 Cylinders-0.074200.2322-0.31960.7494

14 Multicollinearity When two or more independent variables are highly correlated. When multicollinearity is severe the estimated values of coefficients will be unreliable.

15 Multicollinearity Two guidelines for identifying multicollinearity: If the absolute value of the correlation coefficient for two independent variables exceeds 0.7 If the correlation coefficient for an independent variable and some other independent variable is greater than the correlation with that variable and the dependent variable

16 Multicollinearity MPGWeightYearCylinders MPG1 Weight-0.8291 Year0.578-0.3001 Cylinders-0.7730.895-0.3441 Table of correlation coefficients:

17 Multicollinearity CoefficientsStandard Errort Stat Intercept-14.44.03-3.58 Weight-0.006520.000460-14.1 Year0.7600.049815.2 Cylinders-0.07410.232-0.319 R Square0.807 CoefficientsStandard Errort Stat Intercept-16.94.95-3.42 Year0.7470.061212.21 Cylinders-2.990.133-22.46 R Square 0.708

18 Qualitative Variables and Regression Quantitative variable – A variable that can be measured numerically (interval or ratio scale of measurement) Qualitative variable – A variable where labels or names are used to identify some attribute (nominal or ordinal scale of measurement)

19 Qualitative Variables and Regression The effect of a quantitative variable can be estimated using a dummy variable. A dummy variable can equal 0 or 1, it creates different y intercepts for groups with different attributes.

20 Qualitative Variables and Regression Assume we estimate a regression model for the number of sick days an employee takes per year. A dummy variable is included that equals 1 if the individual smokes and 0 if they do not. Age is also included in the model.

21 Qualitative Variables and Regression Estimated model: Sick days taken = -1 +(3)Smoker + (.1)Age Sick DaysSmokerAge 3045 6150 0020 5065 10160 Example of how data would be coded:

22 Dummy Variables Sick days taken = -1 +(3)Smoker + (.1)Age What is the y-intercept for nonsmokers? What is the y-intercept for smokers? 2 What is the predicted number of sick days for a 40-year-old smoker? 6 What is the average difference in the number of sick days taken by smokers and nonsmokers? 3

23 Dummy Variables If an attribute has three or more possible values you must include k-1 dummy variables in the model, where k is the number of possible values.

24 Dummy Variables Suppose we have three job classifications: manager, operator, and secretary Operator dummy equals 1 if the person is an operator, 0 otherwise Secretary dummy equals 1 if the person is an secretary, 0 otherwise Manager is the omitted group (choice of omitted group will not alter the predicted values)

25 Dummy Variables Sick days taken = -1 +(1)Operator + 1.5(Secretary) + (.1)Age What are the y-intercepts for each job classification? Managers=-1, Operators=0, Secretaries=0.5 What is the predicted number of sick days for a 40- year-old secretary? 4.5 What is the average difference in the number of sick days taken by operators and secretaries? 0.5

26 Dummy Variables In some cases there will be multiple sets of dummy variables, such as: Sick days taken = -1 +(3)Smoker + (1)Operator + 1.5(Secretary) + (.1)Age Note that there are now 6 different intercepts: Nonsmoker, Manager: -1 (omitted group) Smoker, Manager: 2 Nonsmoker, Operator: 0 Smoker, Operator: 3 Nonsmoker, Secretary: 0.5 Smoker, Secretary: 3.5

27 Dummy Variables Note that when dummy variables are used we are assuming that the coefficients of the other variables are the same for all groups. In this example the increase in sick days used from aging a year is equal to 0.1 for all of the groups. If there is reason to believe the effect of an independent variable differs by group, you may want to estimate separate equations for each group.

28 Nonlinear Relationships Nonlinear relationships can be modeled by including a variable that is a nonlinear function of an independent variable. For example it is usually assumed that health care expenditures increase at an increasing rate as people age.

29 Nonlinear Relationships In that case you might try including age squared into the model: Health expend = 500 + (5)Age + (.5)AgeSQ AgeHealth Expend 10 600 20800 301100 401500

30 Nonlinear Relationships If the dependent variable increases at a decreasing rate as the independent variable rises you might want to include the square root of the independent variable. If you are unsure of the nature of the relationship you can use dummy variables for different ranges of values of the independent variable.

31 Non-continuous Relationships If the relationship between the dependent variable and an independent variable is non-continuous a slope dummy variable can be used to estimate two sets of coefficients for the independent variable. For example, if natural gas usage is not affected by temperature when the temperature rises above 60 degrees, we could have: Gas usage = b0 + b 1 (GT60) + b 2 (Temp) + b 2 (GT60)(Temp)

32 Non-continuous Relationships Note that at temperatures above 60 degrees the net effect of a 1 degree increase in temperature on gas usage is -0.056 (-.866+.810) Coefficients Standard Errort StatP-value Intercept53.0022.41521.957.48E-18 GT60-46.62316.682-2.790.0098 Temp-0.8660.0595-14.561.02E-13 (GT60)(Temp)0.8100.2553.180.0039


Download ppt "Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y = "

Similar presentations


Ads by Google