STA302/1001 - week 111 Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible.

Slides:



Advertisements
Similar presentations
1 1 Chapter 5: Multiple Regression 5.1 Fitting a Multiple Regression Model 5.2 Fitting a Multiple Regression Model with Interactions 5.3 Generating and.
Advertisements

More on understanding variance inflation factors (VIFk)
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Chapter 4 Describing the Relation Between Two Variables 4.3 Diagnostics on the Least-squares Regression Line.
Describing Relationships Using Correlation and Regression
1 Multiple Regression Response, Y (numerical) Explanatory variables, X 1, X 2, …X k (numerical) New explanatory variables can be created from existing.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
Part I – MULTIVARIATE ANALYSIS C3 Multiple Linear Regression II © Angel A. Juan & Carles Serrat - UPC 2007/2008.
Statistics 350 Lecture 21. Today Last Day: Tests and partial R 2 Today: Multicollinearity.
Lecture 25 Multiple Regression Diagnostics (Sections )
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
CHAPTER 4 ECONOMETRICS x x x x x Multiple Regression = more than one explanatory variable Independent variables are X 2 and X 3. Y i = B 1 + B 2 X 2i +
Lecture 24 Multiple Regression (Sections )
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Predictive Analysis in Marketing Research
1 4. Multiple Regression I ECON 251 Research Methods.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Multiple Regression Research Methods and Statistics.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Objectives of Multiple Regression
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Chapter 12 Multiple Regression and Model Building.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
STA302/ week 911 Multiple Regression A multiple regression model is a model that has more than one explanatory variable in it. Some of the reasons.
2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lecture 4 Introduction to Multiple Regression
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Week 101 ANOVA F Test in Multiple Regression In multiple regression, the ANOVA F test is designed to test the following hypothesis: This test aims to assess.
Correlation & Regression Analysis
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Review Session Linear Regression. Correlation Pearson’s r –Measures the strength and type of a relationship between the x and y variables –Ranges from.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
5-1 MGMG 522 : Session #5 Multicollinearity (Ch. 8)
1 Reg12W G Multiple Regression Week 12 (Wednesday) Review of Regression Diagnostics Influence statistics Multicollinearity Examples.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 15 Multiple Regression Model Building
Linear Regression.
Meadowfoam Example Continuation
Analysis of Variance in Matrix form
Regression Diagnostics
Multiple Regression Analysis and Model Building
Simple Linear Regression - Introduction
CHAPTER 29: Multiple Regression*
Rainfall Example The data set contains cord yield (bushes per acre) and rainfall (inches) in six US corn-producing states (Iowa, Nebraska, Illinois, Indiana,
24/02/11 Tutorial 3 Inferential Statistics, Statistical Modelling & Survey Methods (BS2506) Pairach Piboonrungroj (Champ)
I271b Quantitative Methods
Product moment correlation
Checking Assumptions Primary Assumptions Secondary Assumptions
Ch 4.1 & 4.2 Two dimensions concept
Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible to measure their.
Regression Part II.
Presentation transcript:

STA302/ week 111 Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible to measure their individual influence on the response. The fitted regression equation is unstable. The estimated regression coefficients vary widely from data set to data set (even if data sets are very similar) and depending on which predictor variables are in the model. The estimated regression coefficients may even have opposite sign than what is expected (e.g, bedroom in house price example).

STA302/ week 112 The regression coefficients may not be statistically significant from 0 even when corresponding explanatory variable is known to have a relationship with the response. When some X’s are perfectly correlated, we can’t estimate β because X’X is singular. Even if X’X is close to singular, its determinant will be close to 0 and the standard errors of estimated coefficients will be large.

STA302/ week 113 Quantitative Assessment of Multicollinearity To asses multicolinearity we calculate the Variance Inflation Factor for each of the predictor variables in the model. The variance inflation factor for the i th predictor variable is defined as where is the coefficient of multiple determination obtained when the i th predictor variable is regressed against p-1 other predictor variables. Large value of VIF i is a sign of multicollinearity.

STA302/ week 114 Rainfall Example The data set contains cord yield (bushes per acre) and rainfall (inches) in six US corn-producing states (Iowa, Nebraska, Illinois, Indiana, Missouri and Ohio). Straight line model is not adequate – up to 12″ rainfall yield increases and then starts to decrease. A better model for this data is a quadratic model: Yield = β 0 + β 1 ∙rain + β 2 ∙rain 2 + ε. This is still a multiple linear regression model since it is linear in the β’s. However, we can not interpret individual coefficients, since we can’t change one variable while holding the other constant…

STA302/ week 115 More on Rainfall Example Examination of residuals (from quadratic model) versus year showed that perhaps there is a pattern of an increase over time. Fit a model with year… To assess whether yield’s relationship with rainfall depends on year we include an interaction term in the model…

STA302/ week 116 Interaction Two predictor variables are said to interact if the effect that one of them has on the response depends on the value of the other. To include interaction term in a model we simply the have to take the product of the two predictor variables and include the resulting variable in the model and an additional predictor. Interaction terms should not routinely be added to the model. Why? We should add interaction terms when the question of interest has to do with interaction or we suspect interaction exists (e.g., from plot of residuals versus interaction term). If an interaction term for 2 predictor variables is in the model we should also include terms for predictor variables as well even if their coefficients are not statistically significant different from 0.