Download presentation

Presentation is loading. Please wait.

Published byCharles McNulty Modified over 3 years ago

1
Multiple Regression Dummy Variables Multicollinearity Interaction Effects Heteroscedasticity

2
Lecture Objectives You should be able to : 1.Convert categorical variables into dummies. 2.Identify and eliminate Multicollinearity. 3.Use interaction terms and interpret their coefficients. 4.Identify heteroscedasticity.

3
I. Using Categorical Data: Dummy Variables YX1X2 Accidents per 10000Car ObsDriversAgeColor 18917Red 27017Black 37517Blue 48518Red 57418Black 67618Blue 79019Red 87819Black 97019Blue Red Consider insurance company data on accidents and their relationship to age of driver and the color of car driven. See spreadsheet for complete data.

4
Coding a Categorical Variable Original Coding Alternate Coding YX1X2 YX1X2 Accidents per 10000Car per 10000Car DriversAgeColor DriversAgeColor This is the incorrect way. Output from the two ways of coding give inconsistent results.

5
Original Coding: Partial Output and Forecasts SUMMARYOUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations27 Coefficients Intercept Age Color RESIDUAL OUTPUT Observation Predicted AccidentsResiduals

6
Modified Coding: Partial Output and Forecasts SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations27 Coefficients Intercept Age Color RESIDUAL OUTPUT Observation Predicted AccidentsResiduals

7
Coding with Dummies Original Dummies Alternately Coded Dummies YX1X2X3 YX1X2X3 Accidents per DriversAgeD1 RedD2 Black DriversAgeD1 BlackD2 Blue This is the correct way. Output from either way of coding gives the same forecasts.

8
Original Dummy Coding SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations27 Coefficients Intercept Age D1 Red D2 Black RESIDUAL OUTPUT Observation Predicted AccidentsResiduals

9
Modified Dummy Coding SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations27 Coefficients Intercept Age D1 Black D2 Blue RESIDUAL OUTPUT Observation Predicted AccidentsResiduals

10
II. Multicollinearity We wish to forecast the height of a person based on the length of his/her feet. Consider data as shown: HeightRightLeft

11
Regression with Right Foot SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations105 Coefficients Intercept Right As right foot length increases by an inch, height increases on average by 3.99 inches.

12
Regression with Left Foot SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations105 Coefficients Intercept Left As left foot length increases by an inch, height increases on average by 3.99 inches.

13
Regression with Both SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations105 Coefficients Intercept Right Left As right foot length increases by an inch, height increases on average by 8.52 inches (assuming left foot is constant!) while lengthening of the left foot makes a person shorter by 4.55 inches!!

14
The Reason? Multicollinearity. HeightRightLeft Height (y) Right (X1) Left (X2) While both feet (Xs) are correlated with height (y), they are also highly correlated with each other (0.999). In other words, the second foot adds no extra information to the prediction of y. One of the two Xs is sufficient.

15
III. Interaction Effects Scores on test of reflexes YX1X2 ObsScoreAgeGender Do reflexes slow down with age? Are there gender differences? A portion of the data is shown here.

16
Scatterplots with Age, Gender Does age seem related? How about Gender?

17
Correlation, Regression Correlations ScoreAge Score1 Age Gender SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations20 Coefficients Standard Errort StatP-value Intercept E-14 Age E-07 Gender Age is related, gender is not.

18
Interaction Term YX1X2X1*X2 ObsScoreAgeGender Age* Gender ………… A 2-way interaction term is the product of the two variables.

19
Regression with Interaction SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations20 Coefficients Standard Errort StatP-value Intercept E-11 Age Gender Age*Gender How do we interpret the coefficient for the interaction term?

20
Meaning of Interaction X1 and X2 are said to interact with each other if the impact of X1 on y changes as the value of X2 changes. In this example, the impact of age (X1) on reflexes (y) is different for males and females (changing values of X2). Hence age and gender are said to interact. Explain how this is different from multicollinearity.

21
IV. Heteroscedasticity Consider the water levels in Lake Lanier. There is a trend that can be used to forecast. However, the variability around the trendline is not consistent. The increase in variation makes the prediction margin of error unreliable.

22
Example 2: Income and Spending As income grows, the ability to spend on luxury goods grows with it, and so does the variation in how much is actually spent. Once again, forecasts become less reliable due to changing variation (heteroscedasticity).

23
Solution When heteroscedasticity is identified, data may need to be transformed (change to a log scale, for instance) to reduce its impact. The type of transformation needed depends on the data.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google