DATA ANALYSIS AND MODEL BUILDING LECTURE 9 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.

Slides:



Advertisements
Similar presentations
3.3 Hypothesis Testing in Multiple Linear Regression
Advertisements

Topic 12: Multiple Linear Regression
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
12-1 Multiple Linear Regression Models Introduction Many applications of regression analysis involve situations in which there are more than.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
12 Multiple Linear Regression CHAPTER OUTLINE
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Multiple Linear Regression
Variance and covariance M contains the mean Sums of squares General additive models.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Lecture 24: Thurs., April 8th
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Multiple Linear Regression
Chap 15-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics.
Chapter 9 Multicollinearity
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Chapter 15: Model Building
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Variance and covariance Sums of squares General linear models.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Simple Linear Regression Models
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
Model Building III – Remedial Measures KNNL – Chapter 11.
Model Selection1. 1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each.
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
1 Chapter 3 Multiple Linear Regression Multiple Regression Models Suppose that the yield in pounds of conversion in a chemical process depends.
2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
PS 225 Lecture 20 Linear Regression Equation and Prediction.
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Outliers and influential data points. No outliers?
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Review Session Linear Regression. Correlation Pearson’s r –Measures the strength and type of a relationship between the x and y variables –Ranges from.
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 10.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Chap 13-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 13 Multiple Regression and.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Multiple Regression Numeric Response variable (y) p Numeric predictor variables (p < n) Model: Y =  0 +  1 x 1 +  +  p x p +  Partial Regression.
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
ENGR 610 Applied Statistics Fall Week 12 Marshall University CITE Jack Smith.
DATA ANALYSIS AND MODEL BUILDING LECTURE 7 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
DATA ANALYSIS AND MODEL BUILDING LECTURE 8 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
DSCI 346 Yamasaki Lecture 6 Multiple Regression and Model Building.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 15 Multiple Regression Model Building
Chapter 6 Diagnostics for Leverage and Influence
Chapter 9 Multiple Linear Regression
Chapter 11 Multiple Regression
Regression Diagnostics
Multivariate Analysis Lec 4
Regression Model Building - Diagnostics
CHAPTER 29: Multiple Regression*
Regression Model Building - Diagnostics
Multivariate Linear Regression
Presentation transcript:

DATA ANALYSIS AND MODEL BUILDING LECTURE 9 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool Department of Economics University of the West Indies St. Augustine Campus

Multiple Linear Regression Multiple Linear Regression

Linear Regression

Properties of the Hat matrix

Variance-Covariance Matrices

Confidence and Prediction Intervals

Sums of Squares

Overall Significance Test

Scatter plot Matrix of the Air Data Set in S-Plus pairs(air)

Polynomial Models y=β 0 + β 1 x+ β 2 x 2 …+ β k x k Problems: Powers of x tend to be large in magnitude Powers of x tend to be highly correlated Solutions: Centering and scaling of x variables Orthogonal polynomials

Plot of mpg vs. weight for 74 autos

Orthogonal Polynomials Generation is similar to Gram-Schmidt orthogonalization (see Strang, Linear Algebra) Resulting vectors are orthonormal X’X=I Hence (X’X)-1= I and coefficients = (X’X)-1X’y = X’y Addition of higher degree term does not affect coefficients for lower degree terms Correlation of coefficients = I SE of coefficients = s = sqrt(MSE)

Plot of mpg by weight with fitted regression

Indicator Variables Sometimes we might want to fit a model with a categorical variable as a predictor. For instance, automobile price as a function of where the car is made (Germany, Japan, USA). If there are c categories, we need c-1 indicator (0,1) variables as predictors. For instance j=1 if car is made in Japan, 0 otherwise, u=1 if car is made in USA, 0 otherwise. If there are just 2 categories and no other predictors, we could just do a t-test for difference in means.

Box plots of price by country

Histogram of automobile prices

Histogram of log of automobile prices

Regression Diagnostics Goal: identify remarkable observations and unremarkable predictors. Problems with observations: Outliers Influential observations Problems with predictors: A predictor may not add much to model. A predictor may be too similar to another predictor (collinearity). Predictors may have been left out.

Plot of standardized residuals vs. fitted values for air dataset

Plot of residual vs. fit for air data set with all interaction terms

Plot of residual vs. fit for air model with x3*x4 interaction

Remarkable Observations

Plot of standardized residual vs. observation number for air dataset

Hat matrix diagonals

Plot of wind vs. ozone

Cook’s Distance

Plot of ozone vs. wind including fitted regression lines with and without observation 30 (simple linear regression)

Remedies for Outliers Nothing? Data Transformation? Remove outliers? Robust Regression –weighted least squares: b=(X’WX)-1X’Wy Minimize median absolute deviation

Collinearity High correlation among the predictors can cause problems with least squares estimates (wrong signs, low t-values, unexpected results). If predictors are centered and scaled to unit length, then X’X is the correlation matrix. Diagonal elements of inverse of correlation matrix are called VIF’s (variance inflation factors). is the coefficient of determination for the regression of the jth predictor on the remaining predictors

Remedies for collinearity 1.Identify and eliminate redundant variables (large literature on this). 2.Modified regression techniques a.ridge regression, b=(X’X+cI)-1X’y 3.Regress on orthogonal linear combinations of the explanatory variables a.principal components regression 4.Careful variable selection

Correlation and inverse of correlation matrix for air data set.

Correlation and inverse of correlation matrix for mpg data set

Variable Selection We want a parsimonious model – as few variables as possible to still provide reasonable accuracy in predicting y. Some variables may not contribute much to the model. SSE never will increase if add more variables to model, however MSE=SSE/(n-k-1) may. Minimum MSE is one possible optimality criterion. However, must fit all possible subsets (2kof them) and find one with minimum MSE.

Backward Elimination 1.Fit the full model (with all candidate predictors). 2.If P-values for all coefficients < α then stop. 3.Delete predictor with highest P-value 4.Refit the model 5.Go to Step 2.

The end