Download presentation

Presentation is loading. Please wait.

Published byNicole Clark Modified over 3 years ago

2
**13 Multiple Regression Chapter Multiple Regression**

Assessing Overall Fit Predictor Significance Confidence Intervals for Y Binary Predictors Tests for Nonlinearity and Interaction Multicollinearity Violations of Assumptions Other Regression Topics

3
**Multiple Regression Bivariate or Multivariate?**

Multiple regression is an extension of bivariate regression to include more than one independent variable. Limitations of bivariate regression: - often simplistic - biased estimates if relevant predictors are omitted - lack of fit does not show that X is unrelated to Y McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

4
**Multiple Regression Regression Terminology**

Y is the response variable and is assumed to be related to the k predictors (X1, X2, … Xk) by a linear equation called the population regression model: The fitted regression equation is: McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

5
**Multiple Regression Data Format**

n observed values of the response variable Y and its proposed predictors X1, X2, … Xk are presented in the form of an n x k matrix: McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

6
**Multiple Regression Illustration: Home Prices**

Consider the following data of the selling price of a home (Y, the response variable) and three potential explanatory variables: X1 = SqFt X2 = LotSize X3 = Baths McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

7
**Multiple Regression Illustration: Home Prices**

Intuitively, the regression models are McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

8
**Multiple Regression Logic of Variable Selection**

State the hypotheses about the sign of the coefficients in the model. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

9
**Multiple Regression Fitted Regressions**

Use Excel, MegaStat, MINITAB, or any other statistical package. For n = 30 home sales, here are the fitted regressions and their statistics of fit. R2 is the coefficient of determination and SE is the standard error of the regression. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

10
**Multiple Regression Common Misconceptions about Fit**

A common mistake is to assume that the model with the best fit is preferred. Principle of Occam’s Razor: When two explanations are otherwise equivalent, we prefer the simpler, more parsimonious one. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

11
**Multiple Regression Regression Modeling**

Four Criteria for Regression Assessment Logic Is there an a priori reason to expect a causal relationship between the predictors and the response variable? Fit Does the overall regression show a significant relationship between the predictors and the response variable? McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

12
**Multiple Regression Regression Modeling**

Four Criteria for Regression Assessment Parsimony Does each predictor contribute significantly to the explanation? Are some predictors not worth the trouble? Stability Are the predictors related to one another so strongly that regression estimates become erratic? McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

13
**Assessing Overall Fit F Test for Significance**

For a regression with k predictors, the hypotheses to be tested are H0: All the true coefficients are zero H1: At least one of the coefficients is nonzero In other words, H0: b1 = b2 = … = b4 = 0 H1: At least one of the coefficients is nonzero McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

14
**Assessing Overall Fit F Test for Significance**

The ANOVA table decomposes variation of the response variable around its mean into McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

15
**Assessing Overall Fit F Test for Significance**

The ANOVA calculations for a k-predictor model can be summarized as McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

16
**Assessing Overall Fit F Test for Significance**

Here are the ANOVA calculations for the home price data McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

17
**Assessing Overall Fit Coefficient of Determination (R2)**

R2, the coefficient of determination, is a common measure of overall fit. It can be calculated one of two ways. For example, for the home price data, McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

18
**Assessing Overall Fit Adjusted R2**

It is generally possible to raise the coefficient of determination R2 by including addition predictors. The adjusted coefficient of determination is done to penalize the inclusion of useless predictors. For n observations and k predictors, For the home price data, the adjusted R2 is McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

19
**Assessing Overall Fit How Many Predictors?**

Limit the number of predictors based on the sample size. When n/k is small, the R2 no longer gives a reliable indication of fit. Suggested rules are: Evan’s Rule (conservative): n/k > 0 (at least 10 observations per predictor) Doane’s Rule (relaxed): n/k > 5 (at least 5 observations per predictor) McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

20
**Predictor Significance**

F Test for Significance Test each fitted coefficient to see whether it is significantly different from zero. The hypothesis tests for predictor Xj are If we cannot reject the hypothesis that a coefficient is zero, then the corresponding predictor does not contribute to the prediction of Y. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

21
**Predictor Significance**

Test Statistic The test statistic for coefficient of predictor Xj is Find the critical value ta for a chosen level of significance a from Appendix D. Reject H0 if tj > ta or if p-value < a. The 95% confidence interval for coefficient bj is McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

22
**Confidence Intervals for Y**

Standard Error The standard error of the regression (SE) is another important measure of fit. For n observations and k predictors If all predictions were perfect, the SE = 0. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

23
**Confidence Intervals for Y**

Standard Error Approximate 95% confidence interval for conditional mean of Y. The Approximate 95% prediction interval for individual Y value McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

24
**Confidence Intervals for Y**

Very Quick Prediction Interval for Y The t-values for 95% confidence are typically near 2 (as long as n is too small). A very quick prediction interval without using a t table is: Approximate 95% confidence interval for conditional mean of Y. The Approximate 95% prediction interval for individual Y value McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

25
**Binary Predictors What Is a Binary Predictor?**

A binary predictor has two values (usually 0 and 1) to denote the presence or absence of a condition. For example, for n graduates from an MBA program: Employed = 1 Unemployed = 0 These variables are also called dummy or indicator variables. For easy understandability, name the binary variable the characteristic that is equivalent to the value of 1. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

26
**Binary Predictors Effects of a Binary Predictor**

A binary predictor is sometimes called a shift variable because it shifts the regression plane up or down. Suppose X1 is a binary predictor which can take on only the values of 0 or 1. Its contribution to the regression is either b1 or nothing, resulting in an intercept of either b0 (when X1 = 0) or b0 + b1 (when X1 = 1). McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

27
**Binary Predictors Effects of a Binary Predictor**

The slope does not change, only the intercept is shifted. For example, McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

28
**Binary Predictors Testing a Binary for Significance**

In multiple regression, binary predictors require no special treatment. They are tested as any other predictor using a t test. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

29
**Binary Predictors More Than One Binary**

More than one binary occurs when the number of categories to be coded exceeds two. For example, for the variable GPA by class level, each category is a binary variable: Freshman = 1 if a freshman, 0 otherwise Sophomore = 1 if a sophomore, 0 otherwise Junior = 1 if a junior, 0 otherwise Senior = 1 if a senior, 0 otherwise Masters = 1 if a master’s candidate, 0 otherwise Doctoral = 1 if a PhD candidate, 0 otherwise McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

30
**Binary Predictors More Than One Binary**

If there are c mutually exclusive and collectively exhaustive categories, then there are only c-1 binaries to code each observation. Any one of the categories can be omitted because the remaining c-1 binary values uniquely determine the remaining binary. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

31
**Binary Predictors What if I Forget to Exclude One Binary?**

Including all c binaries for c categories would introduce a serious problem for the regression estimation. One column in the X data matrix will be a perfect linear combination of the other column(s). The least squares estimation would fail because the data matrix would be singular (i.e., would have no inverse). McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

32
**Binary Predictors Regional Binaries**

Binaries are commonly used to code regions. For example, Midwest = 1 if in the Midwest, 0 otherwise Neast = 1 if in the Northeast, 0 otherwise Seast = 1 if in the Southeast, 0 otherwise West = 1 if in the West, 0 otherwise McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

33
**Tests for Nonlinearity and Interaction**

Sometimes the effect of a predictor is nonlinear. To test for nonlinearity of any predictor, include its square in the regression. For example, If the linear model is the correct one, the coefficients of the squared predictors b2 and b4 would not differ significantly from zero. Otherwise a quadratic relationship would exist between Y and the respective predictor variable. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

34
**Tests for Nonlinearity and Interaction**

Tests for Interaction Test for interaction between two predictors by including their product in the regression. If we reject the hypothesis H0: b3 = 0, then we conclude that there is a significant interaction between X1 and X2. Interaction effects require careful interpretation and cost 1 degree of freedom per interaction. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

35
**Multicollinearity What is Multicollinearity?**

Multicollinearity occurs when the independent variables X1, X2, …, Xm are intercorrelated instead of being independent. Collinearity occurs if only two predictors are correlated. The degree of multicollinearity is the real concern. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

36
**Multicollinearity Variance Inflation**

Multicollinearity induces variance inflation when predictors are strongly intercorrelated. This results in wider confidence intervals for the true coefficients b1, b2, …, bm and makes the t statistic less reliable. The separate contribution of each predictor in “explaining” the response variable is difficult to identify. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

37
**Multicollinearity Correlation Matrix**

To check whether two predictors are correlated (collinearity), inspect the correlation matrix using Excel, MegaStat, or MINITAB. For example, McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

38
**Multicollinearity Correlation Matrix**

A quick Rule: A sample correlation whose absolute value exceeds 2/ n probably differs significantly from zero in a two-tailed test at a = .05. This applies to samples that are not too small (say, 20 or more). McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

39
**Multicollinearity Predictor Matrix Plots**

The collinearity for the squared predictors can often be seen in scatter plots. For example, McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

40
**Multicollinearity Variance Inflation Factor (VIF)**

The matrix scatter plots and correlation matrix only show correlations between any two predictors. The variance inflation factor (VIF) is a more comprehensive test for multicollinearity. For a given predictor j, the VIF is defined as where Rj2 is the coefficient of determination when predictor j is regressed against all other predictors. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

41
**Multicollinearity Variance Inflation Factor (VIF)**

Some possible situations are: McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

42
**Multicollinearity Rules of Thumb**

There is no limit on the magnitude of the VIF. A VIF of 10 says that the other predictors “explain” 90% of the variation in predictor j. This indicates that predictor j is strongly related to the other predictors. However, it is not necessarily indicative of instability in the least squares estimate. A large VIF is a warning to consider whether predictor j really belongs to the model. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

43
**Multicollinearity Are Coefficients Stable? Evidence of instability is**

when X1 and X2 have a high pairwise correlation with Y, yet one or both predictors have insignificant t statistics in the fitted multiple regression, and/or if X1 and X2 are positively correlated with Y, yet one has a negative slope in the multiple regression. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

44
**Multicollinearity Are Coefficients Stable?**

As a test, try dropping a collinear predictor from the regression and seeing what happens to the fitted coefficients in the re-estimated model. If they don’t change much, then multicollinearity is not a concern. If it causes sharp changes in one or more of the remaining coefficients in the model, then the multicollinearity may be causing instability. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

45
**Violations of Assumptions**

The least squares method makes several assumptions about the (unobservable) random errors ei. Clues about these errors may be found in the residuals ei. Assumption 1: The errors are normally distributed. Assumption 2: The errors have constant variance (i.e., they are homoscedastic). Assumption 3: The errors are independent (i.e., they are nonautocorrelated). McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

46
**Violations of Assumptions**

Non-Normal Errors Except when there are major outliers, non-normal residuals are usually considered a mild violation. Regression coefficients and variance remain unbiased and consistent. Confidence intervals for the parameters may be unreliable since they are based on the normality assumption. The confidence intervals are generally OK with a large sample size (e.g., n > 30) and no outliers. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

47
**Violations of Assumptions**

Non-Normal Errors Test H0: Errors are normally distributed H1: Errors are not normally distributed Create a histogram of residuals (plain or standardized) to visually reveal any outliers or serious asymmetry. The normal probability plot will also visually test for normality. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

48
**Violations of Assumptions**

Nonconstant Variance (Heteroscedasticity) If the error variance is constant, the errors are homoscedastic. If the error variance is nonconstant, the errors are heteroscedastic. This violation is potentially serious. The least squares regression parameter estimates are unbiased and consistent. Estimated variances are biased (understated) and not efficient, resulting in overstated t statistics and narrow confidence intervals. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

49
**Violations of Assumptions**

Nonconstant Variance (Heteroscedasticity) The hypotheses are: H0: Errors have constant variance (homoscedastic) H1: Errors have nonconstant variance (heteroscedastic) Constant variance can be visually tested by examining scatter plots of the residuals against each predictor. Ideally there will be no pattern. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

50
**Violations of Assumptions**

Nonconstant Variance (Heteroscedasticity) McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

51
**Violations of Assumptions**

Autocorrelation Autocorrelation is a pattern of nonindependent errors that violates the assumption that each error is independent of its predecessor. This is a problem with time series data. Autocorrelated errors results in biased estimated variances which will result in narrow confidence intervals and large t statistics. The model’s fit may be overstated. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

52
**Violations of Assumptions**

Autocorrelation Test the hypotheses: H0: Errors are nonautocorrelated H1: Errors are autocorrelated We will use the observable residuals e1, e2, …, en for evidence of autocorrelation and the Durbin-Watson test statistic DW: McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

53
**Violations of Assumptions**

Autocorrelation The DW statistic lies between 0 and 4. When H0 is true (no autocorrelation), the DW statistic will be near 2. A DW < 2 suggests positive autocorrelation. A DW > 2 suggests negative autocorrelation. Ignore the DW statistic for cross-sectional data. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

54
**Violations of Assumptions**

Unusual Observations An observation may be unusual 1. because the fitted model’s prediction is poor (unusual residuals), or 2. because one or more predictors may be having a large influence on the regression estimates (unusual leverage). McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

55
**Violations of Assumptions**

Unusual Observations To check for unusual residuals, simply inspect the residuals to find instances where the model does not predict well. To check for unusual leverage, look at the leverage statistic (how far each observation is from the mean(s) of the predictors) for each observation. For n observations and k predictors, look for observations whose leverage exceeds 2(k + 1)/n. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

56
**Other Regression Topics**

Outliers: Causes and Cures An outlier may be due to an error in recording the data and if so, the observation should be deleted. It is reasonable to discard an observation on the grounds that it represents a different population that the other observations. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

57
**Other Regression Topics**

Missing Predictors An outlier may also be an observation that has been influenced by an unspecified “lurking” variable that should have been controlled but wasn’t. Try to identify the lurking variable and formulate a multiple regression model including both predictors. Unspecified “lurking” variables cause inaccurate predictions from the fitted regression. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

58
**Other Regression Topics**

Ill-Conditioned Data All variables in the regression should be of the same general order of magnitude. Do not mix very large data values with very small data values. To avoid mixing magnitudes, adjust the decimal point in both variables. Be consistent throughout the data column. The decimal adjustments for each data column need not be the same. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

59
**Other Regression Topics**

Significance in Large Samples Statistical significance may not imply practical importance. Anything can be made significant if you get a large enough sample. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

60
**Other Regression Topics**

Model Specification Errors A misspecified model occurs when you estimate a linear model when actually a nonlinear model is required or when a relevant predictor is omitted. To detect misspecification - Plot the residuals against estimated Y (should be no discernable pattern). - Plot the residuals against actual Y (should be no discernable pattern). - Plot the fitted Y against the actual Y (should be a 45 line). McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

61
**Other Regression Topics**

Missing Data Discard a variable if many data values are missing. If a Y value is missing, discard the observation to be conservative. Other options would be to use the mean of the X data column for the missing values or to use a regression procedure to “fit” the missing X-value from the complete observations. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

62
**Other Regression Topics**

Binary Dependent Variable When the response variable Y is binary (0, 1), the least squares estimation method is no longer appropriate. Use logit and probit regression methods. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

63
**Other Regression Topics**

Stepwise and Best Subsets Regression The stepwise regression procedure finds the best fitting model using 1, 2, 3, …, k predictors. This procedure is appropriate only when there is no theoretical model that specifies which predictors should be used. Perform best subsets regression using all possible combinations of predictors. McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

64
**Applied Statistics in Business and Economics**

End of Chapter 13

Similar presentations

OK

1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.

1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on leverages sophie Ppt on subsoil investigation Ppt on blood stain pattern analysis expiration pattern Ppt on south african culture for kids Ppt on current account deficit meaning Ppt on limits and derivatives quiz Download ppt on 15 august Ppt on non agricultural activities youtube Download ppt on world trade organization Ppt on tsunami early warning system