# 13 Multiple Regression Chapter Multiple Regression

## Presentation on theme: "13 Multiple Regression Chapter Multiple Regression"— Presentation transcript:

13 Multiple Regression Chapter Multiple Regression
Assessing Overall Fit Predictor Significance Confidence Intervals for Y Binary Predictors Tests for Nonlinearity and Interaction Multicollinearity Violations of Assumptions Other Regression Topics

Multiple Regression Bivariate or Multivariate?
Multiple regression is an extension of bivariate regression to include more than one independent variable. Limitations of bivariate regression: - often simplistic - biased estimates if relevant predictors are omitted - lack of fit does not show that X is unrelated to Y McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Multiple Regression Regression Terminology
Y is the response variable and is assumed to be related to the k predictors (X1, X2, … Xk) by a linear equation called the population regression model: The fitted regression equation is: McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Multiple Regression Data Format
n observed values of the response variable Y and its proposed predictors X1, X2, … Xk are presented in the form of an n x k matrix: McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Multiple Regression Illustration: Home Prices
Consider the following data of the selling price of a home (Y, the response variable) and three potential explanatory variables: X1 = SqFt X2 = LotSize X3 = Baths McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Multiple Regression Illustration: Home Prices

Multiple Regression Logic of Variable Selection

Multiple Regression Fitted Regressions
Use Excel, MegaStat, MINITAB, or any other statistical package. For n = 30 home sales, here are the fitted regressions and their statistics of fit. R2 is the coefficient of determination and SE is the standard error of the regression. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Multiple Regression Common Misconceptions about Fit
A common mistake is to assume that the model with the best fit is preferred. Principle of Occam’s Razor: When two explanations are otherwise equivalent, we prefer the simpler, more parsimonious one. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Multiple Regression Regression Modeling
Four Criteria for Regression Assessment Logic Is there an a priori reason to expect a causal relationship between the predictors and the response variable? Fit Does the overall regression show a significant relationship between the predictors and the response variable? McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Multiple Regression Regression Modeling
Four Criteria for Regression Assessment Parsimony Does each predictor contribute significantly to the explanation? Are some predictors not worth the trouble? Stability Are the predictors related to one another so strongly that regression estimates become erratic? McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Assessing Overall Fit F Test for Significance
For a regression with k predictors, the hypotheses to be tested are H0: All the true coefficients are zero H1: At least one of the coefficients is nonzero In other words, H0: b1 = b2 = … = b4 = 0 H1: At least one of the coefficients is nonzero McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Assessing Overall Fit F Test for Significance
The ANOVA table decomposes variation of the response variable around its mean into McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Assessing Overall Fit F Test for Significance

Assessing Overall Fit F Test for Significance

Assessing Overall Fit Coefficient of Determination (R2)
R2, the coefficient of determination, is a common measure of overall fit. It can be calculated one of two ways. For example, for the home price data, McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

It is generally possible to raise the coefficient of determination R2 by including addition predictors. The adjusted coefficient of determination is done to penalize the inclusion of useless predictors. For n observations and k predictors, For the home price data, the adjusted R2 is McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Assessing Overall Fit How Many Predictors?
Limit the number of predictors based on the sample size. When n/k is small, the R2 no longer gives a reliable indication of fit. Suggested rules are: Evan’s Rule (conservative): n/k > 0 (at least 10 observations per predictor) Doane’s Rule (relaxed): n/k > 5 (at least 5 observations per predictor) McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Predictor Significance
F Test for Significance Test each fitted coefficient to see whether it is significantly different from zero. The hypothesis tests for predictor Xj are If we cannot reject the hypothesis that a coefficient is zero, then the corresponding predictor does not contribute to the prediction of Y. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Predictor Significance
Test Statistic The test statistic for coefficient of predictor Xj is Find the critical value ta for a chosen level of significance a from Appendix D. Reject H0 if tj > ta or if p-value < a. The 95% confidence interval for coefficient bj is McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Confidence Intervals for Y
Standard Error The standard error of the regression (SE) is another important measure of fit. For n observations and k predictors If all predictions were perfect, the SE = 0. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Confidence Intervals for Y
Standard Error Approximate 95% confidence interval for conditional mean of Y. The Approximate 95% prediction interval for individual Y value McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Confidence Intervals for Y
Very Quick Prediction Interval for Y The t-values for 95% confidence are typically near 2 (as long as n is too small). A very quick prediction interval without using a t table is: Approximate 95% confidence interval for conditional mean of Y. The Approximate 95% prediction interval for individual Y value McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Binary Predictors What Is a Binary Predictor?
A binary predictor has two values (usually 0 and 1) to denote the presence or absence of a condition. For example, for n graduates from an MBA program: Employed = 1 Unemployed = 0 These variables are also called dummy or indicator variables. For easy understandability, name the binary variable the characteristic that is equivalent to the value of 1. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Binary Predictors Effects of a Binary Predictor
A binary predictor is sometimes called a shift variable because it shifts the regression plane up or down. Suppose X1 is a binary predictor which can take on only the values of 0 or 1. Its contribution to the regression is either b1 or nothing, resulting in an intercept of either b0 (when X1 = 0) or b0 + b1 (when X1 = 1). McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Binary Predictors Effects of a Binary Predictor

Binary Predictors Testing a Binary for Significance
In multiple regression, binary predictors require no special treatment. They are tested as any other predictor using a t test. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Binary Predictors More Than One Binary
More than one binary occurs when the number of categories to be coded exceeds two. For example, for the variable GPA by class level, each category is a binary variable: Freshman = 1 if a freshman, 0 otherwise Sophomore = 1 if a sophomore, 0 otherwise Junior = 1 if a junior, 0 otherwise Senior = 1 if a senior, 0 otherwise Masters = 1 if a master’s candidate, 0 otherwise Doctoral = 1 if a PhD candidate, 0 otherwise McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Binary Predictors More Than One Binary
If there are c mutually exclusive and collectively exhaustive categories, then there are only c-1 binaries to code each observation. Any one of the categories can be omitted because the remaining c-1 binary values uniquely determine the remaining binary. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Binary Predictors What if I Forget to Exclude One Binary?
Including all c binaries for c categories would introduce a serious problem for the regression estimation. One column in the X data matrix will be a perfect linear combination of the other column(s). The least squares estimation would fail because the data matrix would be singular (i.e., would have no inverse). McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Binary Predictors Regional Binaries
Binaries are commonly used to code regions. For example, Midwest = 1 if in the Midwest, 0 otherwise Neast = 1 if in the Northeast, 0 otherwise Seast = 1 if in the Southeast, 0 otherwise West = 1 if in the West, 0 otherwise McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Tests for Nonlinearity and Interaction
Sometimes the effect of a predictor is nonlinear. To test for nonlinearity of any predictor, include its square in the regression. For example, If the linear model is the correct one, the coefficients of the squared predictors b2 and b4 would not differ significantly from zero. Otherwise a quadratic relationship would exist between Y and the respective predictor variable. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Tests for Nonlinearity and Interaction
Tests for Interaction Test for interaction between two predictors by including their product in the regression. If we reject the hypothesis H0: b3 = 0, then we conclude that there is a significant interaction between X1 and X2. Interaction effects require careful interpretation and cost 1 degree of freedom per interaction. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Multicollinearity What is Multicollinearity?
Multicollinearity occurs when the independent variables X1, X2, …, Xm are intercorrelated instead of being independent. Collinearity occurs if only two predictors are correlated. The degree of multicollinearity is the real concern. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Multicollinearity Variance Inflation
Multicollinearity induces variance inflation when predictors are strongly intercorrelated. This results in wider confidence intervals for the true coefficients b1, b2, …, bm and makes the t statistic less reliable. The separate contribution of each predictor in “explaining” the response variable is difficult to identify. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Multicollinearity Correlation Matrix
To check whether two predictors are correlated (collinearity), inspect the correlation matrix using Excel, MegaStat, or MINITAB. For example, McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Multicollinearity Correlation Matrix
A quick Rule: A sample correlation whose absolute value exceeds 2/ n probably differs significantly from zero in a two-tailed test at a = .05. This applies to samples that are not too small (say, 20 or more). McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Multicollinearity Predictor Matrix Plots
The collinearity for the squared predictors can often be seen in scatter plots. For example, McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Multicollinearity Variance Inflation Factor (VIF)
The matrix scatter plots and correlation matrix only show correlations between any two predictors. The variance inflation factor (VIF) is a more comprehensive test for multicollinearity. For a given predictor j, the VIF is defined as where Rj2 is the coefficient of determination when predictor j is regressed against all other predictors. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Multicollinearity Variance Inflation Factor (VIF)

Multicollinearity Rules of Thumb
There is no limit on the magnitude of the VIF. A VIF of 10 says that the other predictors “explain” 90% of the variation in predictor j. This indicates that predictor j is strongly related to the other predictors. However, it is not necessarily indicative of instability in the least squares estimate. A large VIF is a warning to consider whether predictor j really belongs to the model. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Multicollinearity Are Coefficients Stable? Evidence of instability is
when X1 and X2 have a high pairwise correlation with Y, yet one or both predictors have insignificant t statistics in the fitted multiple regression, and/or if X1 and X2 are positively correlated with Y, yet one has a negative slope in the multiple regression. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Multicollinearity Are Coefficients Stable?
As a test, try dropping a collinear predictor from the regression and seeing what happens to the fitted coefficients in the re-estimated model. If they don’t change much, then multicollinearity is not a concern. If it causes sharp changes in one or more of the remaining coefficients in the model, then the multicollinearity may be causing instability. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Violations of Assumptions
The least squares method makes several assumptions about the (unobservable) random errors ei. Clues about these errors may be found in the residuals ei. Assumption 1: The errors are normally distributed. Assumption 2: The errors have constant variance (i.e., they are homoscedastic). Assumption 3: The errors are independent (i.e., they are nonautocorrelated). McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Violations of Assumptions
Non-Normal Errors Except when there are major outliers, non-normal residuals are usually considered a mild violation. Regression coefficients and variance remain unbiased and consistent. Confidence intervals for the parameters may be unreliable since they are based on the normality assumption. The confidence intervals are generally OK with a large sample size (e.g., n > 30) and no outliers. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Violations of Assumptions
Non-Normal Errors Test H0: Errors are normally distributed H1: Errors are not normally distributed Create a histogram of residuals (plain or standardized) to visually reveal any outliers or serious asymmetry. The normal probability plot will also visually test for normality. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Violations of Assumptions
Nonconstant Variance (Heteroscedasticity) If the error variance is constant, the errors are homoscedastic. If the error variance is nonconstant, the errors are heteroscedastic. This violation is potentially serious. The least squares regression parameter estimates are unbiased and consistent. Estimated variances are biased (understated) and not efficient, resulting in overstated t statistics and narrow confidence intervals. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Violations of Assumptions
Nonconstant Variance (Heteroscedasticity) The hypotheses are: H0: Errors have constant variance (homoscedastic) H1: Errors have nonconstant variance (heteroscedastic) Constant variance can be visually tested by examining scatter plots of the residuals against each predictor. Ideally there will be no pattern. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Violations of Assumptions

Violations of Assumptions
Autocorrelation Autocorrelation is a pattern of nonindependent errors that violates the assumption that each error is independent of its predecessor. This is a problem with time series data. Autocorrelated errors results in biased estimated variances which will result in narrow confidence intervals and large t statistics. The model’s fit may be overstated. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Violations of Assumptions
Autocorrelation Test the hypotheses: H0: Errors are nonautocorrelated H1: Errors are autocorrelated We will use the observable residuals e1, e2, …, en for evidence of autocorrelation and the Durbin-Watson test statistic DW: McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Violations of Assumptions
Autocorrelation The DW statistic lies between 0 and 4. When H0 is true (no autocorrelation), the DW statistic will be near 2. A DW < 2 suggests positive autocorrelation. A DW > 2 suggests negative autocorrelation. Ignore the DW statistic for cross-sectional data. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Violations of Assumptions
Unusual Observations An observation may be unusual 1. because the fitted model’s prediction is poor (unusual residuals), or 2. because one or more predictors may be having a large influence on the regression estimates (unusual leverage). McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Violations of Assumptions
Unusual Observations To check for unusual residuals, simply inspect the residuals to find instances where the model does not predict well. To check for unusual leverage, look at the leverage statistic (how far each observation is from the mean(s) of the predictors) for each observation. For n observations and k predictors, look for observations whose leverage exceeds 2(k + 1)/n. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Other Regression Topics
Outliers: Causes and Cures An outlier may be due to an error in recording the data and if so, the observation should be deleted. It is reasonable to discard an observation on the grounds that it represents a different population that the other observations. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Other Regression Topics
Missing Predictors An outlier may also be an observation that has been influenced by an unspecified “lurking” variable that should have been controlled but wasn’t. Try to identify the lurking variable and formulate a multiple regression model including both predictors. Unspecified “lurking” variables cause inaccurate predictions from the fitted regression. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Other Regression Topics
Ill-Conditioned Data All variables in the regression should be of the same general order of magnitude. Do not mix very large data values with very small data values. To avoid mixing magnitudes, adjust the decimal point in both variables. Be consistent throughout the data column. The decimal adjustments for each data column need not be the same. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Other Regression Topics
Significance in Large Samples Statistical significance may not imply practical importance. Anything can be made significant if you get a large enough sample. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Other Regression Topics
Model Specification Errors A misspecified model occurs when you estimate a linear model when actually a nonlinear model is required or when a relevant predictor is omitted. To detect misspecification - Plot the residuals against estimated Y (should be no discernable pattern). - Plot the residuals against actual Y (should be no discernable pattern). - Plot the fitted Y against the actual Y (should be a 45 line). McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Other Regression Topics
Missing Data Discard a variable if many data values are missing. If a Y value is missing, discard the observation to be conservative. Other options would be to use the mean of the X data column for the missing values or to use a regression procedure to “fit” the missing X-value from the complete observations. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Other Regression Topics
Binary Dependent Variable When the response variable Y is binary (0, 1), the least squares estimation method is no longer appropriate. Use logit and probit regression methods. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Other Regression Topics
Stepwise and Best Subsets Regression The stepwise regression procedure finds the best fitting model using 1, 2, 3, …, k predictors. This procedure is appropriate only when there is no theoretical model that specifies which predictors should be used. Perform best subsets regression using all possible combinations of predictors. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Applied Statistics in Business and Economics
End of Chapter 13