Lecture 25 Multiple Regression Diagnostics (Sections )

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
1 Multiple Regression Model Error Term Assumptions –Example 1: Locating a motor inn Goodness of Fit (R-square) Validity of estimates (t-stats & F-stats)
Lecture 9- Chapter 19 Multiple regression Introduction In this chapter we extend the simple linear regression model and allow for any number of.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Lecture 21: Review Review a few points about regression that I went over quickly concerning coefficient of determination, regression diagnostics and transformation.
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Stat 112: Lecture 17 Notes Chapter 6.8: Assessing the Assumption that the Disturbances are Independent Chapter 7.1: Using and Interpreting Indicator Variables.
1 Multiple Regression Chapter Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
1 Multiple Regression. 2 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent variables.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Stat 112: Lecture 14 Notes Finish Chapter 6:
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 24 Multiple Regression (Sections )
Chapter Topics Types of Regression Models
Lecture 24: Thurs., April 8th
Lecture 20 Simple linear regression (18.6, 18.9)
Regression Diagnostics - I
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Class 7: Thurs., Sep. 30. Outliers and Influential Observations Outlier: Any really unusual observation. Outlier in the X direction (called high leverage.
1 Simple Linear Regression and Correlation Chapter 17.
Lecture 23 Multiple Regression (Sections )
1 4. Multiple Regression I ECON 251 Research Methods.
Slide Copyright © 2010 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Business Statistics First Edition.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Lecture 17 Interaction Plots Simple Linear Regression (Chapter ) Homework 4 due Friday. JMP instructions for question are actually for.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Business Statistics - QBM117 Statistical inference for regression.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 12 Multiple Regression and Model Building.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Introduction to Linear Regression
Stat 112 Notes 17 Time Series and Assessing the Assumption that the Disturbances Are Independent (Chapter 6.8) Using and Interpreting Indicator Variables.
Stat 112 Notes 15 Today: –Outliers and influential points. Homework 4 due on Thursday.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
SCHEDULE OF WEEK 10 Project 2 is online, due by Monday, Dec 5 at 03:00 am 2. Discuss the DW test and how the statistic attains less/greater that 2 values.
Lecture 10: Correlation and Regression Model.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 6: Multiple Regression Model Building Priyantha.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
1 Assessment and Interpretation: MBA Program Admission Policy The dean of a large university wants to raise the admission standards to the popular MBA.
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
Lecture 9 Forecasting. Introduction to Forecasting * * * * * * * * o o o o o o o o Model 1Model 2 Which model performs better? There are many forecasting.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Warm-Up The least squares slope b1 is an estimate of the true slope of the line that relates global average temperature to CO2. Since b1 = is very.
Inference for Least Squares Lines
Linear Regression.
Diagnostics and Transformation for SLR
Diagnostics and Transformation for SLR
Presentation transcript:

Lecture 25 Multiple Regression Diagnostics (Sections 19.4-19.5) Polynomial Models (Section 20.2)

19.4 Regression Diagnostics - II The conditions required for the model assessment to apply must be checked. Is the error variable normally distributed? Is the regression function correctly specified as a linear function of x1,…,xk ( ) Plot the residuals versus x and Is the error variance constant? Are the errors independent? Can we identify outliers and influential observations? Is multicollinearity a problem? Draw a histogram of the residuals Plot the residuals versus y ^ Plot the residuals versus the time periods

Effects of Violated Assumptions Curvature ( ): slopes no longer meaningful (Potential remedy: Transformations of responses and predictors) Violations of other assumptions: tests, p-values, CIs are no longer accurate. That is, inference is invalidated (Remedies may be difficult)

Influential Observation Influential observation: An observation is influential if removing it would markedly change the results of the analysis. In order to be influential, a point must either be (i) an outlier in terms of the relationship between its y and x’s or (ii) have unusually distant x’s (high leverage) and not fall exactly into the relationship between y and x’s that the rest of the data follows.

Simple Linear Regression Example Data in salary.jmp. Y=Weekly Salary, X=Years of Experience.

Identification of Influential Observations Cook’s distance is a measure of the influence of a point – the effect that omitting the observation has on the estimated regression coefficients. Use Save Columns, Cook’s D Influence to obtain Cook’s Distance. Plot Cook’s Distances: Graph, Overlay Plot, put Cook’s D Influence in Y and leave X blank (plots Cook’

Cook’s Distance Rule of thumb: Observation with Cook’s Distance (Di) >1 has high influence. You should also be concerned about any observation that has Di<1 but has a much bigger Di than any other observation. Ex. 19.2:

Strategy for dealing with influential observations/outliers Do the conclusions change when the obs. is deleted? If No. Proceed with the obs. Included. Study the obs to see if anything can be learned. If Yes. Is there reason to believe the case belongs to a population other than the one under investigation? If Yes. Omit the case and proceed. If No. Does the case have unusually “distant” independent variables. If Yes. Omit the case and proceed. Report conclusions for the reduced range of explanatory variables. If No. Not much can be said. More data are needed to resolve the questions.

Multicollinearity Multicollinearity: Condition in which independent variables are highly correlated. Exact collinearity: Y=Weight, X1=Height in inches, X2=Height in feet. Then provide the same predictions. Multicollinearity causes two kinds of difficulties: The t statistics appear to be too small. The b coefficients cannot be interpreted as “slopes”.

Multicollinearity Diagnostics High correlation between independent variables Counterintuitive signs on regression coefficients Low values for t-statistics despite a significant overall fit, as measured by the F statistic.

Diagnostics: Multicollinearity Example 19.2: Predicting house price (Xm19-02) A real estate agent believes that a house selling price can be predicted using the house size, number of bedrooms, and lot size. A random sample of 100 houses was drawn and data recorded. Analyze the relationship among the four variables

Diagnostics: Multicollinearity The proposed model is PRICE = b0 + b1BEDROOMS + b2H-SIZE +b3LOTSIZE + e The model is valid, but no variable is significantly related to the selling price ?!

Diagnostics: Multicollinearity Multicollinearity is found to be a problem. Multicollinearity causes two kinds of difficulties: The t statistics appear to be too small. The b coefficients cannot be interpreted as “slopes”.

Remedying Violations of the Required Conditions Nonnormality or heteroscedasticity can be remedied using transformations on the y variable. The transformations can improve the linear relationship between the dependent variable and the independent variables. Many computer software systems allow us to make the transformations easily.

Reducing Nonnormality by Transformations Transformations, Example. Reducing Nonnormality by Transformations A brief list of transformations y’ = log y (for y > 0) Use when the se increases with y, or Use when the error distribution is positively skewed y’ = y2 Use when the s2e is proportional to E(y), or Use when the error distribution is negatively skewed y’ = y1/2 (for y > 0) Use when the s2e is proportional to E(y) y’ = 1/y Use when s2e increases significantly when y increases beyond some critical value.

Durbin - Watson Test: Are the Errors Autocorrelated? This test detects first order autocorrelation between consecutive residuals in a time series If autocorrelation exists the error variables are not independent

Positive First Order Autocorrelation + Residuals + + + Time + + + + Positive first order autocorrelation occurs when consecutive residuals tend to be similar. Then, the value of d is small (less than 2).

Negative First Order Autocorrelation Residuals + + + + + Time + + Negative first order autocorrelation occurs when consecutive residuals tend to markedly differ. Then, the value of d is large (greater than 2).

Durbin-Watson Test in JMP H0: No first-order autocorrelation. H1: First-order autocorrelation Use row diagnostics, Durbin-Watson test in JMP after fitting the model. Autocorrelation is an estimate of correlation between errors.

Testing the Existence of Autocorrelation, Example Example 19.3 (Xm19-03) How does the weather affect the sales of lift tickets in a ski resort? Data of the past 20 years sales of tickets, along with the total snowfall and the average temperature during Christmas week in each year, was collected. The model hypothesized was TICKETS=b0+b1SNOWFALL+b2TEMPERATURE+e Regression analysis yielded the following results:

20.1 Introduction Regression analysis is one of the most commonly used techniques in statistics. It is considered powerful for several reasons: It can cover a variety of mathematical models linear relationships. non - linear relationships. nominal independent variables. It provides efficient methods for model building

Curvature: Midterm Problem 10

Remedy I: Transformations Use Tukey’s Bulging Rule to choose a transformation.

Remedy II: Polynomial Models y = b0 + b1x1+ b2x2 +…+ bpxp + e y = b0 + b1x + b2x2 + …+bpxp + e

Quadratic Regression

Polynomial Models with One Predictor Variable First order model (p = 1) y = b0 + b1x + e Second order model (p=2) y = b0 + b1x + e b2x2 + e b2 < 0 b2 > 0

Polynomial Models with One Predictor Variable Third order model (p = 3) y = b0 + b1x + b2x2 + e b3x3 + e b3 < 0 b3 > 0

Interaction Two independent variables x1 and x2 interact if the effect of x1 on y is influenced by the value of x2. Interaction can be brought into the multiple linear regression model by including the independent variable x1* x2. Example:

Interaction Cont. “Slope” for x1=E(y|x1+1,x2)-E(y|x1,x2)= Is the expected income increase from an extra year of education higher for people with IQ 100 or with IQ 130 (or is it the same)?

Polynomial Models with Two Predictor Variables First order model y = b0 + b1x1 + b2x2 + e First order model, two predictors,and interaction y = b0 + b1x1 + b2x2 +b3x1x2 + e x1 The effect of one predictor variable on y is independent of the effect of the other predictor variable on y. The two variables interact to affect the value of y. [b0+b2(3)] +b1x1 X2 = 3 [b0+b2(3)] +[b1+b3(3)]x1 X2 = 3 [b0+b2(2)] +b1x1 X2 = 2 [b0+b2(2)] +[b1+b3(2)]x1 [b0+b2(1)] +b1x1 X2 = 1 X2 = 2 [b0+b2(1)] +[b1+b3(1)]x1 X2 =1 x1

Polynomial Models with Two Predictor Variables Second order model y = b0 + b1x1 + b2x2 + b3x12 + b4x22 + e Second order model with interaction y = b0 + b1x1 + b2x2 +b3x12 + b4x22+ e X2 = 3 b5x1x2 + e X2 = 3 y = [b0+b2(3)+b4(32)]+ b1x1 + b3x12 + e X2 = 2 X2 = 2 X2 =1 y = [b0+b2(2)+b4(22)]+ b1x1 + b3x12 + e X2 =1 y = [b0+b2(1)+b4(12)]+ b1x1 + b3x12 + e x1