Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor.

Similar presentations


Presentation on theme: "Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor."— Presentation transcript:

1

2 Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor Significance Confidence Intervals for Y Confidence Intervals for Y Binary Predictors Binary Predictors Tests for Nonlinearity and Interaction Tests for Nonlinearity and Interaction Multicollinearity Violations of Assumptions Violations of Assumptions Other Regression Topics Other Regression Topics

3 Multiple Regression Multiple regression is an extension of bivariate regression to include more than one independent variable.Multiple regression is an extension of bivariate regression to include more than one independent variable. Limitations of bivariate regression: - often simplistic - biased estimates if relevant predictors are omitted - lack of fit does not show that X is unrelated to YLimitations of bivariate regression: - often simplistic - biased estimates if relevant predictors are omitted - lack of fit does not show that X is unrelated to Y McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Bivariate or Multivariate? Bivariate or Multivariate?

4 Multiple Regression Y is the response variable and is assumed to be related to the k predictors (X 1, X 2, … X k ) by a linear equation called the population regression model:Y is the response variable and is assumed to be related to the k predictors (X 1, X 2, … X k ) by a linear equation called the population regression model: The fitted regression equation is:The fitted regression equation is: McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Regression Terminology Regression Terminology

5 Multiple Regression n observed values of the response variable Y and its proposed predictors X 1, X 2, … X k are presented in the form of an n x k matrix:n observed values of the response variable Y and its proposed predictors X 1, X 2, … X k are presented in the form of an n x k matrix: McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Data Format Data Format

6 Multiple Regression Consider the following data of the selling price of a home (Y, the response variable) and three potential explanatory variables: X 1 = SqFt X 2 = LotSize X 3 = BathsConsider the following data of the selling price of a home (Y, the response variable) and three potential explanatory variables: X 1 = SqFt X 2 = LotSize X 3 = Baths McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Illustration: Home Prices Illustration: Home Prices

7 Multiple Regression Intuitively, the regression models areIntuitively, the regression models are McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Illustration: Home Prices Illustration: Home Prices

8 Multiple Regression State the hypotheses about the sign of the coefficients in the model.State the hypotheses about the sign of the coefficients in the model. McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Logic of Variable Selection Logic of Variable Selection

9 Multiple Regression Use Excel, MegaStat, MINITAB, or any other statistical package.Use Excel, MegaStat, MINITAB, or any other statistical package. For n = 30 home sales, here are the fitted regressions and their statistics of fit.For n = 30 home sales, here are the fitted regressions and their statistics of fit. McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Fitted Regressions Fitted Regressions R 2 is the coefficient of determination and SE is the standard error of the regression.R 2 is the coefficient of determination and SE is the standard error of the regression.

10 Multiple Regression A common mistake is to assume that the model with the best fit is preferred.A common mistake is to assume that the model with the best fit is preferred. Principle of Occams Razor: When two explanations are otherwise equivalent, we prefer the simpler, more parsimonious one.Principle of Occams Razor: When two explanations are otherwise equivalent, we prefer the simpler, more parsimonious one. McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Common Misconceptions about Fit Common Misconceptions about Fit

11 Multiple Regression Four Criteria for Regression AssessmentFour Criteria for Regression Assessment LogicIs there an a priori reason to expect a causal relationship between the predictors and the response variable? FitDoes the overall regression show a significant relationship between the predictors and the response variable? McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Regression Modeling Regression Modeling

12 Multiple Regression Four Criteria for Regression AssessmentFour Criteria for Regression Assessment ParsimonyDoes each predictor contribute significantly to the explanation? Are some predictors not worth the trouble? StabilityAre the predictors related to one another so strongly that regression estimates become erratic? McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Regression Modeling Regression Modeling

13 Assessing Overall Fit For a regression with k predictors, the hypotheses to be tested are H 0 : All the true coefficients are zero H 1 : At least one of the coefficients is nonzeroFor a regression with k predictors, the hypotheses to be tested are H 0 : All the true coefficients are zero H 1 : At least one of the coefficients is nonzero In other words, H 0 : 1 = 2 = … = 4 = 0 H 1 : At least one of the coefficients is nonzero McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. F Test for Significance F Test for Significance

14 Assessing Overall Fit The ANOVA table decomposes variation of the response variable around its mean intoThe ANOVA table decomposes variation of the response variable around its mean into McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. F Test for Significance F Test for Significance

15 Assessing Overall Fit The ANOVA calculations for a k-predictor model can be summarized asThe ANOVA calculations for a k-predictor model can be summarized as McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. F Test for Significance F Test for Significance

16 Assessing Overall Fit Here are the ANOVA calculations for the home price dataHere are the ANOVA calculations for the home price data McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. F Test for Significance F Test for Significance

17 Assessing Overall Fit R 2, the coefficient of determination, is a common measure of overall fit.R 2, the coefficient of determination, is a common measure of overall fit. It can be calculated one of two ways.It can be calculated one of two ways. For example, for the home price data,For example, for the home price data, McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Coefficient of Determination (R 2 ) Coefficient of Determination (R 2 )

18 Assessing Overall Fit It is generally possible to raise the coefficient of determination R 2 by including addition predictors.It is generally possible to raise the coefficient of determination R 2 by including addition predictors. The adjusted coefficient of determination is done to penalize the inclusion of useless predictors.The adjusted coefficient of determination is done to penalize the inclusion of useless predictors. For n observations and k predictors,For n observations and k predictors, For the home price data, the adjusted R 2 isFor the home price data, the adjusted R 2 is McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Adjusted R 2 Adjusted R 2

19 Assessing Overall Fit Limit the number of predictors based on the sample size.Limit the number of predictors based on the sample size. When n/k is small, the R 2 no longer gives a reliable indication of fit.When n/k is small, the R 2 no longer gives a reliable indication of fit. Suggested rules are:Suggested rules are: Evans Rule (conservative): n/k > 0 (at least 10 observations per predictor) Doanes Rule (relaxed): n/k > 5 (at least 5 observations per predictor) McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. How Many Predictors? How Many Predictors?

20 Predictor Significance Test each fitted coefficient to see whether it is significantly different from zero.Test each fitted coefficient to see whether it is significantly different from zero. The hypothesis tests for predictor X j areThe hypothesis tests for predictor X j are McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. F Test for Significance F Test for Significance If we cannot reject the hypothesis that a coefficient is zero, then the corresponding predictor does not contribute to the prediction of Y.If we cannot reject the hypothesis that a coefficient is zero, then the corresponding predictor does not contribute to the prediction of Y.

21 Predictor Significance The test statistic for coefficient of predictor X j isThe test statistic for coefficient of predictor X j is McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Test Statistic Test Statistic Find the critical value t for a chosen level of significance from Appendix D.Find the critical value t for a chosen level of significance from Appendix D. Reject H 0 if t j > t or if p-value t or if p-value <. The 95% confidence interval for coefficient j isThe 95% confidence interval for coefficient j is

22 Confidence Intervals for Y The standard error of the regression (SE) is another important measure of fit.The standard error of the regression (SE) is another important measure of fit. For n observations and k predictorsFor n observations and k predictors McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Standard Error Standard Error If all predictions were perfect, the SE = 0.If all predictions were perfect, the SE = 0.

23 Confidence Intervals for Y Approximate 95% confidence interval for conditional mean of Y.Approximate 95% confidence interval for conditional mean of Y. The Approximate 95% prediction interval for individual Y valueThe Approximate 95% prediction interval for individual Y value McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Standard Error Standard Error

24 Confidence Intervals for Y The t-values for 95% confidence are typically near 2 (as long as n is too small).The t-values for 95% confidence are typically near 2 (as long as n is too small). A very quick prediction interval without using a t table is:A very quick prediction interval without using a t table is: McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Very Quick Prediction Interval for Y Very Quick Prediction Interval for Y Approximate 95% confidence interval for conditional mean of Y.Approximate 95% confidence interval for conditional mean of Y. The Approximate 95% prediction interval for individual Y valueThe Approximate 95% prediction interval for individual Y value

25 Binary Predictors A binary predictor has two values (usually 0 and 1) to denote the presence or absence of a condition.A binary predictor has two values (usually 0 and 1) to denote the presence or absence of a condition. For example, for n graduates from an MBA program: Employed = 1 Unemployed = 0For example, for n graduates from an MBA program: Employed = 1 Unemployed = 0 These variables are also called dummy or indicator variables.These variables are also called dummy or indicator variables. For easy understandability, name the binary variable the characteristic that is equivalent to the value of 1.For easy understandability, name the binary variable the characteristic that is equivalent to the value of 1. McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. What Is a Binary Predictor? What Is a Binary Predictor?

26 Binary Predictors A binary predictor is sometimes called a shift variable because it shifts the regression plane up or down.A binary predictor is sometimes called a shift variable because it shifts the regression plane up or down. Suppose X 1 is a binary predictor which can take on only the values of 0 or 1.Suppose X 1 is a binary predictor which can take on only the values of 0 or 1. Its contribution to the regression is either b 1 or nothing, resulting in an intercept of either b 0 (when X 1 = 0) or b 0 + b 1 (when X 1 = 1).Its contribution to the regression is either b 1 or nothing, resulting in an intercept of either b 0 (when X 1 = 0) or b 0 + b 1 (when X 1 = 1). McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Effects of a Binary Predictor Effects of a Binary Predictor

27 Binary Predictors The slope does not change, only the intercept is shifted. For example,The slope does not change, only the intercept is shifted. For example, McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Effects of a Binary Predictor Effects of a Binary Predictor

28 Binary Predictors In multiple regression, binary predictors require no special treatment. They are tested as any other predictor using a t test.In multiple regression, binary predictors require no special treatment. They are tested as any other predictor using a t test. McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Testing a Binary for Significance Testing a Binary for Significance

29 Binary Predictors McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. More than one binary occurs when the number of categories to be coded exceeds two.More than one binary occurs when the number of categories to be coded exceeds two. For example, for the variable GPA by class level, each category is a binary variable: Freshman = 1 if a freshman, 0 otherwise Sophomore = 1 if a sophomore, 0 otherwise Junior = 1 if a junior, 0 otherwise Senior = 1 if a senior, 0 otherwise Masters = 1 if a masters candidate, 0 otherwise Doctoral = 1 if a PhD candidate, 0 otherwiseFor example, for the variable GPA by class level, each category is a binary variable: Freshman = 1 if a freshman, 0 otherwise Sophomore = 1 if a sophomore, 0 otherwise Junior = 1 if a junior, 0 otherwise Senior = 1 if a senior, 0 otherwise Masters = 1 if a masters candidate, 0 otherwise Doctoral = 1 if a PhD candidate, 0 otherwise More Than One Binary More Than One Binary

30 Binary Predictors McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. If there are c mutually exclusive and collectively exhaustive categories, then there are only c-1 binaries to code each observation.If there are c mutually exclusive and collectively exhaustive categories, then there are only c-1 binaries to code each observation. More Than One Binary More Than One Binary Any one of the categories can be omitted because the remaining c-1 binary values uniquely determine the remaining binary.

31 Binary Predictors McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Including all c binaries for c categories would introduce a serious problem for the regression estimation.Including all c binaries for c categories would introduce a serious problem for the regression estimation. One column in the X data matrix will be a perfect linear combination of the other column(s).One column in the X data matrix will be a perfect linear combination of the other column(s). The least squares estimation would fail because the data matrix would be singular (i.e., would have no inverse).The least squares estimation would fail because the data matrix would be singular (i.e., would have no inverse). What if I Forget to Exclude One Binary? What if I Forget to Exclude One Binary?

32 Binary Predictors McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Binaries are commonly used to code regions. For example, Midwest = 1 if in the Midwest, 0 otherwise Neast = 1 if in the Northeast, 0 otherwise Seast = 1 if in the Southeast, 0 otherwise West = 1 if in the West, 0 otherwiseBinaries are commonly used to code regions. For example, Midwest = 1 if in the Midwest, 0 otherwise Neast = 1 if in the Northeast, 0 otherwise Seast = 1 if in the Southeast, 0 otherwise West = 1 if in the West, 0 otherwise Regional Binaries Regional Binaries

33 Tests for Nonlinearity and Interaction McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Sometimes the effect of a predictor is nonlinear.Sometimes the effect of a predictor is nonlinear. To test for nonlinearity of any predictor, include its square in the regression. For example,To test for nonlinearity of any predictor, include its square in the regression. For example, If the linear model is the correct one, the coefficients of the squared predictors 2 and 4 would not differ significantly from zero.If the linear model is the correct one, the coefficients of the squared predictors 2 and 4 would not differ significantly from zero. Otherwise a quadratic relationship would exist between Y and the respective predictor variable.Otherwise a quadratic relationship would exist between Y and the respective predictor variable. Tests for Nonlinearity Tests for Nonlinearity

34 Tests for Nonlinearity and Interaction McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Test for interaction between two predictors by including their product in the regression.Test for interaction between two predictors by including their product in the regression. If we reject the hypothesis H 0 : 3 = 0, then we conclude that there is a significant interaction between X 1 and X 2.If we reject the hypothesis H 0 : 3 = 0, then we conclude that there is a significant interaction between X 1 and X 2. Interaction effects require careful interpretation and cost 1 degree of freedom per interaction.Interaction effects require careful interpretation and cost 1 degree of freedom per interaction. Tests for Interaction Tests for Interaction

35 MulticollinearityMulticollinearity McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Multicollinearity occurs when the independent variables X 1, X 2, …, X m are intercorrelated instead of being independent.Multicollinearity occurs when the independent variables X 1, X 2, …, X m are intercorrelated instead of being independent. Collinearity occurs if only two predictors are correlated.Collinearity occurs if only two predictors are correlated. The degree of multicollinearity is the real concern.The degree of multicollinearity is the real concern. What is Multicollinearity? What is Multicollinearity?

36 MulticollinearityMulticollinearity McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Multicollinearity induces variance inflation when predictors are strongly intercorrelated.Multicollinearity induces variance inflation when predictors are strongly intercorrelated. This results in wider confidence intervals for the true coefficients 1, 2, …, m and makes the t statistic less reliable.This results in wider confidence intervals for the true coefficients 1, 2, …, m and makes the t statistic less reliable. The separate contribution of each predictor in explaining the response variable is difficult to identify.The separate contribution of each predictor in explaining the response variable is difficult to identify. Variance Inflation Variance Inflation

37 MulticollinearityMulticollinearity McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. To check whether two predictors are correlated (collinearity), inspect the correlation matrix using Excel, MegaStat, or MINITAB. For example,To check whether two predictors are correlated (collinearity), inspect the correlation matrix using Excel, MegaStat, or MINITAB. For example, Correlation Matrix Correlation Matrix

38 MulticollinearityMulticollinearity McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Correlation Matrix Correlation Matrix This applies to samples that are not too small (say, 20 or more).This applies to samples that are not too small (say, 20 or more). A quick Rule: A sample correlation whose absolute value exceeds 2/ n probably differs significantly from zero in a two-tailed test at =.05.A quick Rule: A sample correlation whose absolute value exceeds 2/ n probably differs significantly from zero in a two-tailed test at =.05.

39 MulticollinearityMulticollinearity McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Predictor Matrix Plots Predictor Matrix Plots The collinearity for the squared predictors can often be seen in scatter plots. For example,

40 MulticollinearityMulticollinearity McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Variance Inflation Factor (VIF) Variance Inflation Factor (VIF) The matrix scatter plots and correlation matrix only show correlations between any two predictors.The matrix scatter plots and correlation matrix only show correlations between any two predictors. The variance inflation factor (VIF) is a more comprehensive test for multicollinearity.The variance inflation factor (VIF) is a more comprehensive test for multicollinearity. For a given predictor j, the VIF is defined asFor a given predictor j, the VIF is defined as where R j 2 is the coefficient of determination when predictor j is regressed against all other predictors.

41 MulticollinearityMulticollinearity McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Variance Inflation Factor (VIF) Variance Inflation Factor (VIF) Some possible situations are:Some possible situations are:

42 MulticollinearityMulticollinearity McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Rules of Thumb Rules of Thumb There is no limit on the magnitude of the VIF.There is no limit on the magnitude of the VIF. A VIF of 10 says that the other predictors explain 90% of the variation in predictor j.A VIF of 10 says that the other predictors explain 90% of the variation in predictor j. This indicates that predictor j is strongly related to the other predictors.This indicates that predictor j is strongly related to the other predictors. However, it is not necessarily indicative of instability in the least squares estimate.However, it is not necessarily indicative of instability in the least squares estimate. A large VIF is a warning to consider whether predictor j really belongs to the model.A large VIF is a warning to consider whether predictor j really belongs to the model.

43 MulticollinearityMulticollinearity McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Are Coefficients Stable? Are Coefficients Stable? Evidence of instability isEvidence of instability is when X 1 and X 2 have a high pairwise correlation with Y, yet one or both predictors have insignificant t statistics in the fitted multiple regression, and/or if X 1 and X 2 are positively correlated with Y, yet one has a negative slope in the multiple regression.

44 MulticollinearityMulticollinearity McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Are Coefficients Stable? Are Coefficients Stable? As a test, try dropping a collinear predictor from the regression and seeing what happens to the fitted coefficients in the re-estimated model.As a test, try dropping a collinear predictor from the regression and seeing what happens to the fitted coefficients in the re-estimated model. If they dont change much, then multicollinearity is not a concern.If they dont change much, then multicollinearity is not a concern. If it causes sharp changes in one or more of the remaining coefficients in the model, then the multicollinearity may be causing instability.If it causes sharp changes in one or more of the remaining coefficients in the model, then the multicollinearity may be causing instability.

45 Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. The least squares method makes several assumptions about the (unobservable) random errors i. Clues about these errors may be found in the residuals e i.The least squares method makes several assumptions about the (unobservable) random errors i. Clues about these errors may be found in the residuals e i. Assumption 1: The errors are normally distributed.Assumption 1: The errors are normally distributed. Assumption 2: The errors have constant variance (i.e., they are homoscedastic).Assumption 2: The errors have constant variance (i.e., they are homoscedastic). Assumption 3: The errors are independent (i.e., they are nonautocorrelated).Assumption 3: The errors are independent (i.e., they are nonautocorrelated).

46 Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Non-Normal Errors Non-Normal Errors Except when there are major outliers, non-normal residuals are usually considered a mild violation.Except when there are major outliers, non-normal residuals are usually considered a mild violation. Regression coefficients and variance remain unbiased and consistent.Regression coefficients and variance remain unbiased and consistent. Confidence intervals for the parameters may be unreliable since they are based on the normality assumption.Confidence intervals for the parameters may be unreliable since they are based on the normality assumption. The confidence intervals are generally OK with a large sample size (e.g., n > 30) and no outliers.The confidence intervals are generally OK with a large sample size (e.g., n > 30) and no outliers.

47 Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Non-Normal Errors Non-Normal Errors Test H 0 : Errors are normally distributed H 1 : Errors are not normally distributedTest H 0 : Errors are normally distributed H 1 : Errors are not normally distributed Create a histogram of residuals (plain or standardized) to visually reveal any outliers or serious asymmetry.Create a histogram of residuals (plain or standardized) to visually reveal any outliers or serious asymmetry. The normal probability plot will also visually test for normality.The normal probability plot will also visually test for normality.

48 Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Nonconstant Variance (Heteroscedasticity) Nonconstant Variance (Heteroscedasticity) If the error variance is constant, the errors are homoscedastic. If the error variance is nonconstant, the errors are heteroscedastic.If the error variance is constant, the errors are homoscedastic. If the error variance is nonconstant, the errors are heteroscedastic. This violation is potentially serious.This violation is potentially serious. The least squares regression parameter estimates are unbiased and consistent.The least squares regression parameter estimates are unbiased and consistent. Estimated variances are biased (understated) and not efficient, resulting in overstated t statistics and narrow confidence intervals.Estimated variances are biased (understated) and not efficient, resulting in overstated t statistics and narrow confidence intervals.

49 Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Nonconstant Variance (Heteroscedasticity) Nonconstant Variance (Heteroscedasticity) The hypotheses are: H 0 : Errors have constant variance (homoscedastic) H 1 : Errors have nonconstant variance (heteroscedastic)The hypotheses are: H 0 : Errors have constant variance (homoscedastic) H 1 : Errors have nonconstant variance (heteroscedastic) Constant variance can be visually tested by examining scatter plots of the residuals against each predictor.Constant variance can be visually tested by examining scatter plots of the residuals against each predictor. Ideally there will be no pattern.Ideally there will be no pattern.

50 Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Nonconstant Variance (Heteroscedasticity) Nonconstant Variance (Heteroscedasticity)

51 Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Autocorrelation Autocorrelation Autocorrelation is a pattern of nonindependent errors that violates the assumption that each error is independent of its predecessor.Autocorrelation is a pattern of nonindependent errors that violates the assumption that each error is independent of its predecessor. This is a problem with time series data.This is a problem with time series data. Autocorrelated errors results in biased estimated variances which will result in narrow confidence intervals and large t statistics.Autocorrelated errors results in biased estimated variances which will result in narrow confidence intervals and large t statistics. The models fit may be overstated.The models fit may be overstated.

52 Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Autocorrelation Autocorrelation Test the hypotheses: H 0 : Errors are nonautocorrelated H 1 : Errors are autocorrelatedTest the hypotheses: H 0 : Errors are nonautocorrelated H 1 : Errors are autocorrelated We will use the observable residuals e 1, e 2, …, e n for evidence of autocorrelation and the Durbin- Watson test statistic DW:We will use the observable residuals e 1, e 2, …, e n for evidence of autocorrelation and the Durbin- Watson test statistic DW:

53 Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Autocorrelation Autocorrelation The DW statistic lies between 0 and 4.The DW statistic lies between 0 and 4. When H 0 is true (no autocorrelation), the DW statistic will be near 2.When H 0 is true (no autocorrelation), the DW statistic will be near 2. A DW < 2 suggests positive autocorrelation.A DW < 2 suggests positive autocorrelation. A DW > 2 suggests negative autocorrelation.A DW > 2 suggests negative autocorrelation. Ignore the DW statistic for cross-sectional data.Ignore the DW statistic for cross-sectional data.

54 Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Unusual Observations Unusual Observations An observation may be unusualAn observation may be unusual 1. because the fitted models prediction is poor (unusual residuals), or 2. because one or more predictors may be having a large influence on the regression estimates (unusual leverage).

55 Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Unusual Observations Unusual Observations To check for unusual residuals, simply inspect the residuals to find instances where the model does not predict well.To check for unusual residuals, simply inspect the residuals to find instances where the model does not predict well. To check for unusual leverage, look at the leverage statistic (how far each observation is from the mean(s) of the predictors) for each observation.To check for unusual leverage, look at the leverage statistic (how far each observation is from the mean(s) of the predictors) for each observation. For n observations and k predictors, look for observations whose leverage exceeds 2(k + 1)/n.For n observations and k predictors, look for observations whose leverage exceeds 2(k + 1)/n.

56 Other Regression Topics McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. An outlier may be due to an error in recording the data and if so, the observation should be deleted.An outlier may be due to an error in recording the data and if so, the observation should be deleted. It is reasonable to discard an observation on the grounds that it represents a different population that the other observations.It is reasonable to discard an observation on the grounds that it represents a different population that the other observations. Outliers: Causes and Cures Outliers: Causes and Cures

57 Other Regression Topics McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. An outlier may also be an observation that has been influenced by an unspecified lurking variable that should have been controlled but wasnt.An outlier may also be an observation that has been influenced by an unspecified lurking variable that should have been controlled but wasnt. Try to identify the lurking variable and formulate a multiple regression model including both predictors.Try to identify the lurking variable and formulate a multiple regression model including both predictors. Unspecified lurking variables cause inaccurate predictions from the fitted regression.Unspecified lurking variables cause inaccurate predictions from the fitted regression. Missing Predictors Missing Predictors

58 Other Regression Topics McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. All variables in the regression should be of the same general order of magnitude.All variables in the regression should be of the same general order of magnitude. Do not mix very large data values with very small data values.Do not mix very large data values with very small data values. To avoid mixing magnitudes, adjust the decimal point in both variables.To avoid mixing magnitudes, adjust the decimal point in both variables. Be consistent throughout the data column.Be consistent throughout the data column. The decimal adjustments for each data column need not be the same.The decimal adjustments for each data column need not be the same. Ill-Conditioned Data Ill-Conditioned Data

59 Other Regression Topics McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Statistical significance may not imply practical importance.Statistical significance may not imply practical importance. Anything can be made significant if you get a large enough sample.Anything can be made significant if you get a large enough sample. Significance in Large Samples Significance in Large Samples

60 Other Regression Topics McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. A misspecified model occurs when you estimate a linear model when actually a nonlinear model is required or when a relevant predictor is omitted.A misspecified model occurs when you estimate a linear model when actually a nonlinear model is required or when a relevant predictor is omitted. To detect misspecification - Plot the residuals against estimated Y (should be no discernable pattern). - Plot the residuals against actual Y (should be no discernable pattern). - Plot the fitted Y against the actual Y (should be a 45 line).To detect misspecification - Plot the residuals against estimated Y (should be no discernable pattern). - Plot the residuals against actual Y (should be no discernable pattern). - Plot the fitted Y against the actual Y (should be a 45 line). Model Specification Errors Model Specification Errors

61 Other Regression Topics McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Discard a variable if many data values are missing.Discard a variable if many data values are missing. If a Y value is missing, discard the observation to be conservative.If a Y value is missing, discard the observation to be conservative. Other options would be to use the mean of the X data column for the missing values or to use a regression procedure to fit the missing X-value from the complete observations.Other options would be to use the mean of the X data column for the missing values or to use a regression procedure to fit the missing X-value from the complete observations. Missing Data Missing Data

62 Other Regression Topics McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. When the response variable Y is binary (0, 1), the least squares estimation method is no longer appropriate.When the response variable Y is binary (0, 1), the least squares estimation method is no longer appropriate. Use logit and probit regression methods.Use logit and probit regression methods. Binary Dependent Variable Binary Dependent Variable

63 Other Regression Topics McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. The stepwise regression procedure finds the best fitting model using 1, 2, 3, …, k predictors.The stepwise regression procedure finds the best fitting model using 1, 2, 3, …, k predictors. This procedure is appropriate only when there is no theoretical model that specifies which predictors should be used.This procedure is appropriate only when there is no theoretical model that specifies which predictors should be used. Perform best subsets regression using all possible combinations of predictors.Perform best subsets regression using all possible combinations of predictors. Stepwise and Best Subsets Regression Stepwise and Best Subsets Regression

64 Applied Statistics in Business and Economics End of Chapter 13


Download ppt "Multiple Regression Chapter 1313 Multiple Regression Multiple Regression Assessing Overall Fit Assessing Overall Fit Predictor Significance Predictor."

Similar presentations


Ads by Google