Multicollinearity in Regression Principal Components Analysis

Slides:



Advertisements
Similar presentations
Class 21: Tues., Nov. 23 Today: Multicollinearity, One-way analysis of variance Schedule: –Tues., Nov. 30 th – Review, Homework 8 due –Thurs., Dec. 2 nd.
Advertisements

3.3 Hypothesis Testing in Multiple Linear Regression
More on understanding variance inflation factors (VIFk)
Covariance Matrix Applications
Chapter 6 Eigenvalues and Eigenvectors
Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Lecture 7: Principal component analysis (PCA)
An introduction to Principal Component Analysis (PCA)
Psychology 202b Advanced Psychological Statistics, II February 1, 2011.
Psychology 202b Advanced Psychological Statistics, II January 25, 2011.
Principal Component Analysis
Factor analysis Caroline van Baal March 3 rd 2004, Boulder.
Factor Analysis There are two main types of factor analysis:
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
19-1 Chapter Nineteen MULTIVARIATE ANALYSIS: An Overview.
Correlation and linear regression
Predictive Analysis in Marketing Research
Ordinary least squares regression (OLS)
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Chapter 9 Multicollinearity
Perfect Negative Correlation Perfect Positive Correlation Non-Existent Correlation Imperfect Negative Correlation Imperfect Positive Correlation.
Tables, Figures, and Equations
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Techniques for studying correlation and covariance structure
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Objectives of Multiple Regression
Chapter 2 Dimensionality Reduction. Linear Methods
PCA Example Air pollution in 41 cities in the USA.
Model Building III – Remedial Measures KNNL – Chapter 11.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Chapter 16 Data Analysis: Testing for Associations.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
UNDERSTANDING DESCRIPTION AND CORRELATION. CORRELATION COEFFICIENTS: DESCRIBING THE STRENGTH OF RELATIONSHIPS Pearson r Correlation Coefficient Strength.
Principle Components Analysis A method for data reduction.
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
DATA ANALYSIS AND MODEL BUILDING LECTURE 9 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Principal Component Analysis (PCA)
Principal Component Analysis
Information Management course
Regression Chapter 6 I Introduction to Regression
Regression Diagnostics
Regression Analysis Simple Linear Regression
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
Multiple Regression II
Multiple Regression II
Principal Component Analysis (PCA)
Multicollinearity in Regression Principal Components Analysis
MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED
I271b Quantitative Methods
Descriptive Statistics vs. Factor Analysis
MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED
Matrix Algebra and Random Vectors
Digital Image Processing Lecture 21: Principal Components for Description Prof. Charlene Tsai *Chapter 11.4 of Gonzalez.
Principal Component Analysis
Financial Econometrics Fin. 505
Principal Component Analysis
Outline Variance Matrix of Stochastic Variables and Orthogonal Transforms Principle Component Analysis Generalized Eigenvalue Decomposition.
Presentation transcript:

Multicollinearity in Regression Principal Components Analysis Standing Heights and Physical Stature Attributes Among Female Police Officer Applicants S.Q. Lafi and J.B. Kaneene (1992). “An Explanation of the Use of Principal Components Analysis to Detect and Correct for Multicollinearity,” Preventive Veterinary Medicine, Vol. 13, pp. 261-275

Data Description Subjects: 33 Females applying for police officer positions Dependent Variable: Y ≡ Standing Height (cm) Independent Variables: X1 ≡ Sitting Height (cm) X2 ≡ Upper Arm Length (cm) X3 ≡ Forearm Length (cm) X4 ≡ Hand Length (cm) X5 ≡ Upper Leg Length (cm) X6 ≡ Lower Leg Length (cm) X7 ≡ Foot Length (inches) X8 ≡ BRACH (100X3/X2) X9 ≡ TIBIO (100X6/X5)

Data

Standardizing the Predictors

Correlations Matrix of Predictors and Inverse

Variance Inflation Factors (VIFs) VIF measures the extent that a regression coefficient’s variance is inflated due to correlations among the set of predictors VIFj = 1/(1-Rj2) where Rj2 is the coefficient of multiple determination when Xj is regressed on the remaining predictors. Values > 10 are often considered to be problematic VIFs can be obtained as the diagonal elements of R-1 Not surprisingly, X2, X3, X5, X6, X8, and X9 are problems (see definitions of X8 and X9)

Regression of Y on [1|X*] Note the surprising negative coefficients for X3*, X5*, and X9*

Principal Components Analysis While the columns of X* are highly correlated, the columns of W are uncorrelated The ls represent the variance corresponding to each principal component

Police Applicants Height Data - I

Police Applicants Height Data - II

Regression of Y on [1|W] Note that W8 and W9 have very small eigenvalues and very small t-statistics Condition indices are 63.5 and 85.2, Both well above 10

Reduced Model Removing last 2 principal components due to small, insignificant t-statistics and high condition indices Let V(g) be the p×g matrix of the eigenvectors for the g retained principal components (p=9, g=7) Let W(g) = X*V(g) Then regress Y on [1|W(g)]

Reduced Regression Fit

Transforming Back to X-scale

Comparison of Coefficients and SEs Original Model Principal Components