Session 6 Applied Regression -- Prof. Juran.

Slides:

Advertisements

Similar presentations

Managerial Economics in a Global Economy

Advertisements

Hypothesis Testing Steps in Hypothesis Testing:

Copyright © 2010 Pearson Education, Inc. Slide

Pengujian Parameter Regresi Pertemuan 26 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.

Objectives (BPS chapter 24)

Session 2. Applied Regression -- Prof. Juran2 Outline for Session 2 More Simple Regression –Bottom Part of the Output Hypothesis Testing –Significance.

1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Introduction to Linear Regression.

LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.

LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.

Chapter 13 Additional Topics in Regression Analysis

Chapter 12 Simple Regression

Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.

1 1 Slide 統計學 Spring 2004 授課教師：統計系余清祥日期： 2004 年 5 月 4 日第十二週：複迴歸.

Lecture 6: Multiple Regression

Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.

Business Statistics - QBM117 Statistical inference for regression.

Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.

Introduction to Regression Analysis, Chapter 13,

Simple Linear Regression Analysis

Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.

Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.

1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.

Example of Simple and Multiple Regression

Objectives of Multiple Regression

Inference for regression - Simple linear regression

Linear Regression Inference

Simple linear regression Linear regression with one predictor variable.

Regression Method.

Session 4. Applied Regression -- Prof. Juran2 Outline for Session 4 Summary Measures for the Full Model –Top Section of the Output –Interval Estimation.

Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.

1 Least squares procedure Inference for least squares lines Simple Linear Regression.

Inferences for Regression

BPS - 3rd Ed. Chapter 211 Inference for Regression.

Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables.

1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.

1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

Introduction to Linear Regression

EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.

Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.

Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.

Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.

Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.

MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.

Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.

Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

14- 1 Chapter Fourteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.

Chapter 13 Multiple Regression

REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.

Lecture 10: Correlation and Regression Model.

Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.

McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.

Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.

Chapter 26 Inferences for Regression. An Example: Body Fat and Waist Size Our chapter example revolves around the relationship between % body fat and.

BPS - 5th Ed. Chapter 231 Inference for Regression.

1 Multiple Regression. 2 Model There are many explanatory variables or independent variables x 1, x 2,…,x p that are linear related to the response variable.

Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.

Chapter 15 Multiple Regression Model Building

Inference for Least Squares Lines

Chapter 11 Simple Regression

Chapter 13 Simple Linear Regression

Multiple Regression Chapter 14.

Essentials of Statistics for Business and Economics (8e)

Presentation transcript:

Session 6 Applied Regression -- Prof. Juran

Outline Residual Analysis Are they normal? Do they have a common variance? Multicollinearity Autocorrelation, serial correlation Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Residual Analysis Assumptions about regression models: The Form of the Model The Residual Errors The Predictor Variables The Data Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Regression Assumptions Recall the assumptions about regression models: The Form of the Model The relationship between Y and each X is assumed to be linear. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

The residuals are normally distributed. The Residual Errors The residuals are normally distributed. The residuals have a mean of zero. The residuals have the same variance. The residuals are independent of each other. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

The Predictor Variables The X variables are nonrandom (i.e. fixed or selected in advance). This assumption is rarely true in business regression analysis. The data are measured without error. This assumption is rarely true in business regression analysis. The X variables are linearly independent of each other (uncorrelated, or orthogonal). This assumption is rarely true in business regression analysis. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

The Data The observations are equally reliable and have equal weights in determining the regression model. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Because many of these assumptions center on the residuals, we need to spend some time studying the residuals in our model, to assess the degree to which these assumptions are valid. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Example: Anscombe’s Quartet Here are four bivariate data sets, devised by F. J. Anscombe. Anscombe, F. J. (1973), “Graphs in Statistical Analysis,” The American Statistician, 27, 17-21. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Three observations: These data sets are clearly different from each other. The differences would not be made obvious by any descriptive statistics or summary regression statistics. We need tools to identify characteristics such as those which differentiate these four data sets. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

The differences can be detected in the different ways that these data sets violate the basic regression assumptions regarding residual errors. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Assumption: The residuals have a mean of zero. This assumption is not likely to be a problem, because the regression procedure ensures that this will be true, unless there is a serious skewness problem. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Assumption: The residuals are normally distributed. We can check this with a number of methods. We might plot a histogram of the residuals to see if they “look” reasonably normal. For this purpose we might want to “standardize” the residuals, so that their values can be compared with our expectations in terms of the standard error. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Standardized Residuals In order to judge whether residuals are outliers or have an inordinate impact on the regression they are commonly standardized. The variance of the ith residual ei, perhaps surprisingly, is not though this is in many examples a reasonable approximation. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

The correct variance is One way to go, therefore, is to calculate the so-called standardized residual for each observation: Alternatively, we could use the so-called studentized residuals: Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

These are both measures of how far individual observations are from their predicted values, and large values of either are signals of concern. Excel (and any other stats software package) produces standardized residuals on command. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

The normal score is calculated using the following procedure: Another way to assess normality is to use a normal probability plot, which graphs the distribution of residuals against what we would expect to see from a standard normal distribution. The normal score is calculated using the following procedure: Order the observations in increasing order of their residual errors. Calculate a quantile, which basically measures what proportion of the data lie below each observation. Calculate the normal score, which is a measure of where we would expect the quantiles to be if we drew a sample of this size from a perfect standard normal distribution. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Trouble! Excel gives us a normal probability plot for the dependent variable, not the residuals We have never assumed that Y is normally distributed Another reason to switch to Minitab, SAS, SPSS, etc. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Assumption: The residuals have the same variance. One way to check this is to plot the actual values of Y against the predicted values. In the case of simple regression, this is a lot like plotting them against the X variable. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Another method is to plot the residuals against the predicted value of Y (or the actual observed value of Y, or in simple regression against the X variable): Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Collinearity Collinearity (also called multicollinearity) is the situation in which one or more of the predictor variables are nearly a linear combination of other predictors. (The opposite condition — in which all independent variables are more or less independent — is called orthogonality.) Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

In the extreme case of exact dependence, the XTX matrix cannot be inverted and the regression procedure will fail. In less extreme cases, we suffer from several possible problems: The independent variables are not “independent”. We can’t talk about the slope coefficients in terms of effects of one variable on Y “all other things held constant”, because changes in one of the X variables are associated with expected changes in other X variables. The slope coefficient values can be very sensitive to changes in the data, and/or which other independent variables are included in the model. Forecasting problems: Large standard errors for all parameters Uncertainty about whether true relationships have been detected Uncertainty about the stability of the correlation structure Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Sources of Collinearity Data collection method. Some combination of X variable values does not exist in the data. Example: Say that we did the tool wear case without ever trying the Type A machine at low speed or the Type B machine at high speed. Collinearity here is the result of the experimental design. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Sources of Collinearity Constraints on the Model or in the Population. Some combination of X variable values does not exist in the population. Example: In the cigarette data, imagine if the states with a high proportion of high school graduates also had a high proportion of black citizens. Collinearity here is the result of attributes of the population. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Sources of Collinearity Model Specification. Adding or including variables that are tightly correlated with other variables already in the model. Example: In a study to predict the profitability of TV programs, we might include both the Nielsen rating and the Nielsen share. Collinearity here is the result of including multiple variables that contain more or less the same information. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Sources of Collinearity Over-definition. We may have a relatively small number of observations, but a large number of independent variables for each. Collinearity here is the result of too few degrees of freedom. In other words, n – p – 1 is small (or in the extreme, negative), because p is large compared with n. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Detecting Collinearity First, be aware of the potential problem, and be vigilant. Second, check the various combinations of independent variables for obvious evidence of collinearity. This might include pairwise correlation analysis, or even regressing each independent variable against all of the others. A high R-square coefficient would be a sign of trouble. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Detecting Collinearity Third, after a regression model has been estimated, watch for these clues: Large changes in a coefficient as an independent variable is added or removed. Large changes in a coefficient as an observation is added or removed. Inappropriate signs or magnitudes of an estimated coefficient as compared to common sense or prior expectations. The Variance Inflation Factor (VIF) is one measure of collinearity’s impact. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Variance Inflation Factor Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Countermeasures Design as much orthogonality into the data as you can. You may improve a pre-existing situation by collecting additional data, as orthogonally as possible. Exclude variables from the model that you know are correlated. Principal Components Analysis: Basically creating a small set of new independent variables, each of which is a linear combination of the larger set of original independent variables (Ch. 9.5, RABE). Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Countermeasures In some cases, rescaling and centering the data can diminish the collinearity. For example, we can translate each observation into a z-stat (by subtracting the mean and dividing by the standard deviation). Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Collinearity in the Supervisor Data Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Cars Applied Regression -- Prof. Juran

New model: Dependent variable is Volkswagen. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Reduced Volkswagen model: Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

42 cars in the data set, 29 represented by the dummy variables above. 13 remaining, of which 5 are VW. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Some Minitab Output Regression Analysis: MSRP versus MPG City, HP, Trunk, Warranty, Audi, Chevrolet, ... The following terms cannot be estimated and were removed: Saturn, Volkswagen, AWD Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 15 10146131594 676408773 34.38 0.000 MPG City 1 176114556 176114556 8.95 0.006 HP 1 837954013 837954013 42.59 0.000 Trunk 1 58 58 0.00 0.999 Warranty 1 100549 100549 0.01 0.944 Audi 1 92960483 92960483 4.73 0.039 Chevrolet 1 4684125 4684125 0.24 0.630 Chrysler 1 13631 13631 0.00 0.979 Ford 1 27504490 27504490 1.40 0.248 Honda 1 479148 479148 0.02 0.877 Lexus 1 5133880 5133880 0.26 0.614 Mazda 1 523598 523598 0.03 0.872 Nissan 1 2520242 2520242 0.13 0.723 Toyota 1 6940256 6940256 0.35 0.558 FWD 1 113111527 113111527 5.75 0.024 RWD 1 8669257 8669257 0.44 0.513 Error 26 511504646 19673256 Total 41 10657636240 Adjusted sums of squares are the additional sums of squares determined by adding each particular term to the model given the other terms are already in the model. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

S R-sq R-sq(adj) R-sq(pred) 4435.45 95.20% 92.43% 88.33% Model Summary S R-sq R-sq(adj) R-sq(pred) 4435.45 95.20% 92.43% 88.33% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant -6018 13336 -0.45 0.656 MPG City 666 223 2.99 0.006 3.04 HP 140.7 21.6 6.53 0.000 7.24 Trunk -1 430 -0.00 0.999 1.72 Warranty -6.7 94.4 -0.07 0.944 5.06 Audi 9109 4190 2.17 0.039 3.93 Chevrolet -1899 3891 -0.49 0.630 2.79 Chrysler -128 4878 -0.03 0.979 2.30 Ford -3912 3308 -1.18 0.248 1.55 Honda -594 3804 -0.16 0.877 1.40 Lexus 2377 4653 0.51 0.614 7.13 Mazda 621 3808 0.16 0.872 1.40 Nissan -1071 2992 -0.36 0.723 1.65 Toyota -1814 3054 -0.59 0.558 2.09 FWD -11443 4772 -2.40 0.024 9.40 RWD -3880 5845 -0.66 0.513 11.25 Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Serial Correlation (A.k.a. Autocorrelation) Here we are concerned with the assumption that the residuals are independent of each other. In particular, we are suspicious that the sequential residuals have a positive correlation. In other words, some information about an observed value of the dependent variable is contained in the previous observation. Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Consider the following historical data set, in which the dependent variable is Consumer Expenditure and the independent variable is Money Stock. (Economists are interested in the effect of Money Stock on Expenditure, because if it is significant it presents an opportunity to influence the economy through public policy.) Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Applied Regression -- Prof. Juran

Summary Residual Analysis Are they normal? Do they have a common variance? Multicollinearity Autocorrelation, serial correlation Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran

For Session 7 Practice the Excel array functions Artsy case Do a full multiple regression model of the cigarette data: www.ilr.cornell.edu/~hadi/RABE/Data/P081.txt Replicate the regression results using matrix algebra OK to e-mail this one to TAs Artsy case Applied Regression -- Prof. Juran Applied Regression -- Prof. Juran