Presentation is loading. Please wait.

Presentation is loading. Please wait.

Week 12 November 17-21 Four Mini-Lectures QMM 510 Fall 2014.

Similar presentations


Presentation on theme: "Week 12 November 17-21 Four Mini-Lectures QMM 510 Fall 2014."— Presentation transcript:

1 Week 12 November Four Mini-Lectures QMM 510 Fall 2014

2 13-2 Chapter Contents 13.1 Multiple Regression 13.2 Assessing Overall Fit 13.3 Predictor Significance 13.4 Confidence Intervals for Y 13.5 Categorical Predictors 13.6 Tests for Nonlinearity and Interaction 13.7 Multicollinearity 13.8 Violations of Assumptions 13.9 Other Regression Topics Chapter 13 Multiple Regression ML 12.1 Much of this is like Chapter 12, except that we have more than one predictor.

3 13-3 Multiple regression is an extension of simple regression to include more than one independent variable. Limitations of simple regression: often simplistic biased estimates if relevant predictors are omitted lack of fit does not show that X is unrelated to Y if the true model is multivariate Simple or Multivariate? Simple or Multivariate? Chapter 13 Multiple Regression

4 13-4 Chapter 13 Visualizing a Multiple Regression Multiple Regression

5 13-5 Y is the response variable and is assumed to be related to the k predictors (X 1, X 2, … X k ) by a linear equation called the population regression model: The estimated (fitted) regression equation is: Regression Terminology Regression Terminology Chapter 13 Multiple Regression Use Roman letters for sample estimates Use Greek letters for population parameters

6 13-6 Fitted Regression: Simple versus Multivariate Fitted Regression: Simple versus Multivariate Chapter 13 Multiple Regression If we have more than two predictors, there is no way to visualize it …

7 13-7 n observed values of the response variable Y and its proposed predictors X 1, X 2, …, X k are presented in the form of an n x k matrix. Data Format Data Format Chapter 13 Multiple Regression

8 13-8 Chapter 13 Common Misconceptions about Fit Common Misconceptions about Fit A common mistake is to assume that the model with the best fit is preferred.A common mistake is to assume that the model with the best fit is preferred. Sometimes a model with a low R 2 may give useful predictions, while a model with a high R 2 may conceal problems.Sometimes a model with a low R 2 may give useful predictions, while a model with a high R 2 may conceal problems. Thoroughly analyze the results before choosing the model.Thoroughly analyze the results before choosing the model. Multiple Regression

9 13-9 Four Criteria for Regression Assessment Logic - Is there an a priori reason to expect a causal relationship between the predictors and the response variable? Logic - Is there an a priori reason to expect a causal relationship between the predictors and the response variable? Fit - Does the overall regression show a significant relationship between the predictors and the response variable? Fit - Does the overall regression show a significant relationship between the predictors and the response variable? Parsimony - Does each predictor contribute significantly to the explanation? Are some predictors not worth the trouble? Parsimony - Does each predictor contribute significantly to the explanation? Are some predictors not worth the trouble? Stability - Are the predictors related to one another so strongly that the regression estimates become erratic? Stability - Are the predictors related to one another so strongly that the regression estimates become erratic? Chapter 13 Multiple Regression

10 13-10 Assessing Overall Fit For a regression with k predictors, the hypotheses to be tested are H 0 : All the true coefficients are zero H 1 : At least one of the coefficients is nonzero In other words, H 0 :  1 =  2 = … =  k = 0 H 1 : At least one of the coefficients is nonzero F Test for Significance F Test for Significance Chapter 13

11 13-11 F Test for Significance F Test for Significance Chapter 13 The ANOVA calculations for a k-predictor model resemble those for a simple regression, except for degrees of freedom: Assessing Overall Fit

12 13-12 R 2, the coefficient of determination, is a common measure of overall fit. It can be calculated in one of two ways (always done by computer). For example, for the home price data, Coefficient of Determination (R 2 ) Coefficient of Determination (R 2 ) Chapter 13 Assessing Overall Fit

13 13-13 It is generally possible to raise the coefficient of determination R 2 by including additional predictors. The adjusted coefficient of determination is done to penalize the inclusion of useless predictors. For n observations and k predictors: Adjusted R 2 Adjusted R 2 Chapter 13 Assessing Overall Fit

14 13-14 Limit the number of predictors based on the sample size. Limit the number of predictors based on the sample size. A large sample size permits many predictors. A large sample size permits many predictors. When n/k is small, the R 2 no longer gives a reliable indication of fit. When n/k is small, the R 2 no longer gives a reliable indication of fit. Suggested rules are: Suggested rules are: Evan’s Rule (conservative): n/k  0 (at least 10 observations per predictor) Doane’s Rule (relaxed): n/k  5 (at least 5 observations predictor) How Many Predictors? How Many Predictors? Chapter 13 Assessing Overall Fit These are just guidelines – use your judgment.

15 13-15 Test each fitted coefficient to see whether it is significantly different from zero. The hypothesis tests for the coefficient of predictor X j are If we cannot reject the hypothesis that a coefficient is zero, then the corresponding predictor does not contribute to the prediction of Y. Chapter 13 Predictor Significance

16 13-16 Excel reports the test statistic for the coefficient of predictor X j :Excel reports the test statistic for the coefficient of predictor X j : Test Statistic Test Statistic Find the critical value t α for chosen level of significance α from Appendix D or from Excel using =T.INV.2T(α,df)  2 tailed test.Find the critical value t α for chosen level of significance α from Appendix D or from Excel using =T.INV.2T(α,df)  2 tailed test. To reject H 0 we compare t calc to t α for the different hypotheses (or reject if p-value  α .To reject H 0 we compare t calc to t α for the different hypotheses (or reject if p-value  α . Chapter 13 The 95% confidence interval for coefficient  j isThe 95% confidence interval for coefficient  j is Predictor Significance

17 13-17 Confidence Intervals for Y The standard error of the regression (s e ) is another important measure of fit. Except for d.f. the formula for s e resembles se for simple regression. For n observations and k predictors Standard Error Standard Error If all predictions were perfect (SSE = 0) then s e = 0. Chapter 13

18 13-18 Approximate 95% confidence interval for conditional mean of Y: Approximate 95% prediction interval for individual Y value: Approximate Confidence and Prediction Intervals for Y Approximate Confidence and Prediction Intervals for Y Chapter 13 Confidence Intervals for Y

19 13-19 The t-values for 95% confidence are typically near 2 (as long as n is not too small). Very quick prediction and confidence intervals for Y interval without using a t table are: Quick 95 Percent Confidence and Prediction Interval for Y Quick 95 Percent Confidence and Prediction Interval for Y Chapter 13 Confidence Intervals for Y

20 12-20 Unusual Observations ML 12.2 Standardized Residuals Standardized Residuals Use Excel, MINITAB, MegaStat or other software to compute standardized residuals. If the absolute value of any standardized residual is at least 2, then it is classified as unusual (as in simple regression). Chapter 13 Leverage and Influence Leverage and Influence A high leverage statistic indicates unusual X values in one or more predictors.A high leverage statistic indicates unusual X values in one or more predictors. Such observations are influential because they are near the edge(s) of the fitted regression plane.Such observations are influential because they are near the edge(s) of the fitted regression plane. Leverage for observation i is denoted h i (computed by MegaStat)Leverage for observation i is denoted h i (computed by MegaStat)

21 12-21 Leverage Leverage unusual For a regression model with k predictors, an observation whose leverage exceeds 2(k+1)/n is unusual. In Chapter 12, the leverage rule was 4/n. With k = 1 predictor, we get 2(k+1)/n = 2(1+1)/n = 4/n. So this leverage criterion applies to simple regression as a special case. Chapter 13 Unusual Observations

22 12-22 Chapter 13 Unusual Observations Example: Heart Death Rate in 50 States Example: Heart Death Rate in 50 States n = 50 states, k = 3 predictors high leverage criterion is 2(k+1)/n = 2(3+1)/50 = Note: Only unusual observations are shown (there were n = 50 observations) MegaStat highlights the high leverage observations (>.160) 4 states (FL, HI, OK, WV) have unusual residuals (> 2 s e ) highlighted by MegaStat standard error s e =

23 13-23 Categorical Predictors ML 12.3 A binary predictor has two values (usually 0 and 1) to denote the presence or absence of a condition. For example, for n graduates from an MBA program: Employed = 1 Unemployed = 0 These variables are also called dummy, dichotomous, or indicator variables. For easy understandability, name the binary variable the characteristic that is equivalent to the value of 1. What Is a Binary or Categorical Predictor? What Is a Binary or Categorical Predictor? Chapter 13

24 13-24 A binary predictor is sometimes called a shift variable because it shifts the regression plane up or down. Suppose X 1 is a binary predictor that can take on only the values of 0 or 1. Its contribution to the regression is either b 1 or nothing, resulting in an intercept of either b 0 (when X 1 = 0) or b 0 + b 1 (when X 1 = 1). The slope does not change: only the intercept is shifted. For example, Effects of a Binary Predictor Effects of a Binary Predictor Chapter 13 Categorical Predictors

25 13-25 In multiple regression, binary predictors require no special treatment. They are tested as any other predictor using a t test. Testing a Binary for Significance Testing a Binary for Significance Chapter 13 More Than One Binary More Than One Binary More than one binary occurs when the number of categories to be coded exceeds two. For example, for the variable GPA by class level, each category is a binary variable: Freshman = 1 if a freshman, 0 otherwise Sophomore = 1 if a sophomore, 0 otherwise Junior = 1 if a junior, 0 otherwise Senior = 1 if a senior, 0 otherwise Masters = 1 if a master’s candidate, 0 otherwise Doctoral = 1 if a PhD candidate, 0 otherwise Categorical Predictors

26 13-26 Including all binaries for all categories may introduce a serious problem of collinearity for the regression estimation. Collinearity occurs when there are redundant independent variables. When the value of one independent variable can be determined from the values of other independent variables, one column in the X data matrix will be a perfect linear combination of the other column(s). The least squares estimation would fail because the data matrix would be singular (i.e., would have no inverse). What if I Forget to Exclude One Binary? What if I Forget to Exclude One Binary? Chapter 13 Categorical Predictors

27 13-27 Other Regression Problems Outliers? (omit only if clearly errors) Missing Predictors? (usually you can’t tell) Ill-Conditioned Data (adjust decimals or take logs) Significance in Large Samples? (if n is huge, almost any regression will be significant) Model Specification Errors? (may show up in residual patterns) Missing Data? (we may have to live without it) Binary Response? (if Y = 0,1 we use logistic regression) Stepwise and Best Subsets Regression (MegaStat does these) Chapter 13


Download ppt "Week 12 November 17-21 Four Mini-Lectures QMM 510 Fall 2014."

Similar presentations


Ads by Google