Presentation is loading. Please wait.

Presentation is loading. Please wait.

STA302/1001 - week 111 Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible.

Similar presentations


Presentation on theme: "STA302/1001 - week 111 Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible."— Presentation transcript:

1 STA302/1001 - week 111 Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible to measure their individual influence on the response. The fitted regression equation is unstable. The estimated regression coefficients vary widely from data set to data set (even if data sets are very similar) and depending on which predictor variables are in the model. The estimated regression coefficients may even have opposite sign than what is expected (e.g, bedroom in house price example).

2 STA302/1001 - week 112 The regression coefficients may not be statistically significant from 0 even when corresponding explanatory variable is known to have a relationship with the response. When some X’s are perfectly correlated, we can’t estimate β because X’X is singular. Even if X’X is close to singular, its determinant will be close to 0 and the standard errors of estimated coefficients will be large.

3 STA302/1001 - week 113 Quantitative Assessment of Multicollinearity To asses multicolinearity we calculate the Variance Inflation Factor for each of the predictor variables in the model. The variance inflation factor for the i th predictor variable is defined as where is the coefficient of multiple determination obtained when the i th predictor variable is regressed against p-1 other predictor variables. Large value of VIF i is a sign of multicollinearity.

4 STA302/1001 - week 114 Rainfall Example The data set contains cord yield (bushes per acre) and rainfall (inches) in six US corn-producing states (Iowa, Nebraska, Illinois, Indiana, Missouri and Ohio). Straight line model is not adequate – up to 12″ rainfall yield increases and then starts to decrease. A better model for this data is a quadratic model: Yield = β 0 + β 1 ∙rain + β 2 ∙rain 2 + ε. This is still a multiple linear regression model since it is linear in the β’s. However, we can not interpret individual coefficients, since we can’t change one variable while holding the other constant…

5 STA302/1001 - week 115 More on Rainfall Example Examination of residuals (from quadratic model) versus year showed that perhaps there is a pattern of an increase over time. Fit a model with year… To assess whether yield’s relationship with rainfall depends on year we include an interaction term in the model…

6 STA302/1001 - week 116 Interaction Two predictor variables are said to interact if the effect that one of them has on the response depends on the value of the other. To include interaction term in a model we simply the have to take the product of the two predictor variables and include the resulting variable in the model and an additional predictor. Interaction terms should not routinely be added to the model. Why? We should add interaction terms when the question of interest has to do with interaction or we suspect interaction exists (e.g., from plot of residuals versus interaction term). If an interaction term for 2 predictor variables is in the model we should also include terms for predictor variables as well even if their coefficients are not statistically significant different from 0.


Download ppt "STA302/1001 - week 111 Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible."

Similar presentations


Ads by Google