Objectives of Multiple Regression

Slides:



Advertisements
Similar presentations
Multiple Regression and Model Building
Advertisements

Topic 12: Multiple Linear Regression
The Multiple Regression Model.
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Simple Linear Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Statistics for Managers Using Microsoft® Excel 5th Edition
SIMPLE LINEAR REGRESSION
Chapter 11 Multiple Regression.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Simple Linear Regression Analysis
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Lecture 5 Correlation and Regression
Correlation & Regression
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
SIMPLE LINEAR REGRESSION
Chapter 11 Simple Regression
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Introduction to Linear Regression
Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Chapter 13 Multiple Regression
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Simple Linear Regression (SLR)
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Correlation & Regression Analysis
Chapter 8: Simple Linear Regression Yang Zhenlin.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly Copyright © 2014 by McGraw-Hill Higher Education. All rights.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Chapter 11: Linear Regression and Correlation Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
The simple linear regression model and parameter estimation
Inference for Least Squares Lines
Essentials of Modern Business Statistics (7e)
Quantitative Methods Simple Regression.
Correlation and Simple Linear Regression
Multiple Regression Models
Correlation and Simple Linear Regression
Simple Linear Regression and Correlation
Multiple Linear Regression
Product moment correlation
Simple Linear Regression
Multiple Regression Berlin Chen
St. Edward’s University
Presentation transcript:

Objectives of Multiple Regression Establish the linear equation that best predicts values of a dependent variable Y using more than one explanatory variable from a large set of potential predictors {x1, x2, ... xk}. Find that subset of all possible predictor variables that explains a significant and appreciable proportion of the variance of Y, trading off adequacy of prediction against the cost of measuring more predictor variables.

Expanding Simple Linear Regression Quadratic model. Adding one or more polynomial terms to the model. Y = b0 + b1x1 + b2x12 + e General polynomial model. Y = b0 + b1x1 + b2x12 + b3x13 + ... + bkx1k + e Any independent variable, xi, which appears in the polynomial regression model as xik is called a kth-degree term.

Polynomial model shapes. Linear Adding one more terms to the model significantly improves the model fit. Quadratic

Incorporating Additional Predictors Simple additive multiple regression model y = b0 + b1x1 + b2x2 + b3x3 + ... + bkxk + e Additive (Effect) Assumption - The expected change in y per unit increment in xj is constant and does not depend on the value of any other predictor. This change in y is equal to j.

Additive regression models: For two independent variables, the response is modeled as a surface.

Interpreting Parameter Values (Model Coefficients) “Intercept” - value of y when all predictors are 0. b0 “Partial slopes” b1, b2, b3, ... bk bj - describes the expected change in y per unit increment in xj when all other predictors in the model are held at a constant value.

b1 - slope in direction of x1. Graphical depiction of bj. b1 - slope in direction of x1. b2 - slope in direction of x2.

Multiple Regression with Interaction Terms Y = b0 + b1x1 + b2x2 + b3x3 + ... + bkxk + b12x1x2 + b13x1x3 + ... + b1kx1xk + ... + bk-1,kxk-1xk + e cross-product terms quantify the interaction among predictors. Interactive (Effect) Assumption: The effect of one predictor, xi, on the response, y, will depend on the value of one or more of the other predictors.

Interpreting Interaction Interaction Model No difference or Define: b1 – No longer the expected change in Y per unit increment in X1! b12 – No easy interpretation! The effect on y of a unit increment in X1, now depends on X2.

} } x2=2 no-interaction x2=1 y b2 x2=0 b2 b1 b0 x1 x2=0 interaction y

Multiple Regression models with interaction: Lines move apart Multiple Regression models with interaction: Lines come together

Effect of the Interaction Term in Multiple Regression Surface is twisted.

A Protocol for Multiple Regression Identify all possible predictors. Establish a method for estimating model parameters and their standard errors. Develop tests to determine if a parameter is equal to zero (i.e. no evidence of association). Reduce number of predictors appropriately. Develop predictions and associated standard error.

Estimating Model Parameters Least Squares Estimation Assuming a random sample of n observations (yi, xi1,xi2,...,xik), i=1,2,...,n. The estimates of the parameters for the best predicting equation: Is found by choosing the values: which minimize the expression:

Normal Equations Take the partial derivatives of the SSE function with respect to 0, 1,…, k, and equate each equation to 0. Solve this system of k+1 equations in k+1 unknowns to obtain the equations for the parameter estimates.

An Overall Measure of How Well the Full Model Performs Coefficient of Multiple Determination Denoted as R2. Defined as the proportion of the variability in the dependent variable y that is accounted for by the independent variables, x1, x2, ..., xk, through the regression model. With only one independent variable (k=1), R2 = r2, the square of the simple correlation coefficient.

Computing the Coefficient of Determination

Multicollinearity A further assumption in multiple regression (absent in SLR), is that the predictors (x1, x2, ... xk) are statistically uncorrelated. That is, the predictors do not co-vary. When the predictors are significantly correlated (correlation greater than about 0.6) then the multiple regression model is said to suffer from problems of multicollinearity. r = 0 r = 0.6 r = 0.8

Effect of Multicollinearity on the Fitted Surface Extreme collinearity y x x x x x2 x1

Multicollinearity leads to Numerical instability in the estimates of the regression parameters – wild fluctuations in these estimates if a few observations are added or removed. No longer have simple interpretations for the regression coefficients in the additive model. Ways to detect multicollinearity Scatterplots of the predictor variables. Correlation matrix for the predictor variables – the higher these correlations the worse the problem. Variance Inflation Factors (VIFs) reported by software packages. Values larger than 10 usually signal a substantial amount of collinearity. What can be done about multicollinearity Regression estimates are still OK, but the resulting confidence/prediction intervals are very wide. Choose explanatory variables wisely! (E.g. consider omitting one of two highly correlated variables.) More advanced solutions: principal components analysis; ridge regression.

Testing in Multiple Regression Testing individual parameters in the model. Computing predicted values and associated standard errors. Overall AOV F-test H0: None of the explanatory variables is a significant predictor of Y Reject if:

Standard Error for Partial Slope Estimate The estimated standard error for: where and is the coefficient of determination for the model with xj as the dependent variable and all other x variables as predictors. What happens if all the predictors are truly independent of each other? If there is high dependency?

Confidence Interval 100(1-a)% Confidence Interval for df for SSE Reflects the number of data points minus the number of parameters that have to be estimated.

Testing whether a partial slope coefficient is equal to zero. Rejection Region: Alternatives: Test Statistic:

Predicting Y We use the least squares fitted value, , as our predictor of a single value of y at a particular value of the explanatory variables (x1, x2, ..., xk). The corresponding interval about the predicted value of y is called a prediction interval. The least squares fitted value also provides the best predictor of E(y), the mean value of y, at a particular value of (x1, x2, ..., xk). The corresponding interval for the mean prediction is called a confidence interval. Formulas for these intervals are much more complicated than in the case of SLR; they cannot be calculated by hand (see the book).

Minimum R2 for a “Significant” Regression Since we have formulas for R2 and F, in terms of n, k, SSE and TSS, we can relate these two quantities. We can then ask the question: what is the min R2 which will ensure the regression model will be declared significant, as measured by the appropriate quantile from the F distribution? The answer (below), shows that this depends on n, k, and SSE/TSS.

Minimum R2 for Simple Linear Regression (k=1)