Multiple Regression and Model Building

Slides:



Advertisements
Similar presentations
Introductory Mathematics & Statistics for Business
Advertisements

Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
Chapter 7 Sampling and Sampling Distributions
Chapter 4: Basic Estimation Techniques
Chi-Square and Analysis of Variance (ANOVA)
9: Examining Relationships in Quantitative Research ESSENTIALS OF MARKETING RESEARCH Hair/Wolfinbarger/Ortinau/Bush.
Statistical Inferences Based on Two Samples
Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 8 Estimation Understandable Statistics Ninth Edition
Experimental Design and Analysis of Variance
Simple Linear Regression Analysis
Correlation and Linear Regression
Adapted by Peter Au, George Brown College McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Chapter 12 Simple Linear Regression
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Chapter 13 Multiple Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 10 Simple Regression.
Chapter 12 Multiple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
SIMPLE LINEAR REGRESSION
Chapter 11 Multiple Regression.
Simple Linear Regression Analysis
Multiple Regression and Correlation Analysis
SIMPLE LINEAR REGRESSION
Ch. 14: The Multiple Regression Model building
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Chapter 13: Inference in Regression
Correlation and Linear Regression
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
CHAPTER 14 MULTIPLE REGRESSION
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Chapter 13 Multiple Regression
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Correlation & Regression Analysis
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Chapter 14 Introduction to Multiple Regression
Multiple Regression and Model Building
Chapter 11 Simple Regression
Prepared by Lee Revere and John Large
Multiple Regression Models
SIMPLE LINEAR REGRESSION
Presentation transcript:

Multiple Regression and Model Building Chapter 14 Multiple Regression and Model Building

Multiple Regression and Model Building 14.1 The Multiple Regression Model and the Least Squares Point Estimate 14.2 Model Assumptions and the Standard Error 14.3 R² and Adjusted R² 14.4 The Overall F Test 14.5 Testing the Significance of an Independent Variable

Multiple Regression and Model Building Continued 14.6 Confidence and Prediction Intervals 14.7 Using Dummy Variables to Model Qualitative Independent Variables 14.8 Model Building and the Effects of Multicollinearity 14.9 Residual Analysis in Multiple Regression

14.1 The Multiple Regression Model and the Least Squares Point Estimate Simple linear regression used one independent variable to explain the dependent variable Multiple regression uses two or more independent variables to describe the dependent variable This allows multiple regression models to handle more complex situations There is no limit to the number of independent variables a model can use Has only one dependent variable

The Multiple Regression Model The linear regression model relating y to x1, x2,…, xk is y = β0 + β1x1 + β2x2 +…+ βkxk +  µy = β0 + β1x1 + β2x2 +…+ βkxk is the mean value of the dependent variable y β0, β1, β2,… βk are unknown the regression parameters relating the mean value of y to x1, x2,…, xk  is an error term that describes the effects on y of all factors other than the independent variables x1, x2,…, xk

The Least Squares Estimates and Point Estimation and Prediction Estimation/prediction equation ŷ = b0 + b1x01 + b2x02 + … + bkx0k is the point estimate of the dependent variable when the independent variables are x1, x2,…, xk It is also the point prediction of an individual value of the dependent variable when the independent variables are x1, x2,…, xk b0, b1, b2,…, bk are the least squares point estimates of the parameters β0, β1, β2,…, βk x01, x02,…, x0k are specified values of the independent predictor variables x1, x2,…, xk

Fuel Consumption Case MINITAB Output Figure 14.4 (a)

14.2 Model Assumptions and the Standard Error The model is y = β0 + β1x1 + β2x2 + … + βkxk +  Assumptions for multiple regression are stated about the model error terms, ’s

The Regression Model Assumptions Mean of Zero Assumption Constant Variance Assumption Normality Assumption Independence Assumption

Sum of Squares

14.3 R2 and Adjusted R2 Total variation is given by the formula Σ(yi - ȳ)2 Explained variation is given by the formula Σ(ŷi - ȳ)2 Unexplained variation is given by the formula Σ(yi - ŷi)2 Total variation is the sum of explained and unexplained variation

R2 and Adjusted R2 Continued The multiple coefficient of determination is the ratio of explained variation to total variation R2 is the proportion of the total variation that is explained by the overall regression model Multiple correlation coefficient R is the square root of R2

Multiple Correlation Coefficient R The multiple correlation coefficient R is just the square root of R2 With simple linear regression, r would take on the sign of b1 There are multiple bi’s with multiple regression For this reason, R is always positive To interpret the direction of the relationship between the x’s and y, you must look to the sign of the appropriate bi coefficient

The Adjusted R2 Adding an independent variable to multiple regression will raise R2 R2 will rise slightly even if the new variable has no relationship to y The adjusted R2 corrects this tendency in R2 As a result, it gives a better estimate of the importance of the independent variables

14.4 The Overall F Test H0: β1= β2 = …= βk = 0 versus Ha: At least one of β1, β2,…, βk ≠ 0 The test statistic is Reject H0 in favor of Ha if F(model) > F* or p-value <  *F is based on k numerator and n-(k+1) denominator degrees of freedom

14.5 Testing the Significance of an Independent Variable A variable in a multiple regression model is not likely to be useful unless there is a significant relationship between it and y To test significance, we use the null hypothesis H0: βj = 0 Versus the alternative hypothesis Ha: βj ≠ 0

Testing Significance of an Independent Variable #2 Alternative Reject H0 If p-Value Ha: βj > 0 t > tα Area under t distribution right of t Ha: βj < 0 t < –tα Area under t distribution left of t Ha: βj ≠ 0 |t| > t/2* Twice area under t distribution right of |t| * That is t > t/2 or t < –t/2

Testing Significance of an Independent Variable #3 Test Statistics 100(1-)% Confidence Interval for βj [b1 ± t/2 Sbj] t, t/2 and p-values are based on n-(k+1) degrees of freedom

Testing Significance of an Independent Variable #4 It is customary to test the significance of every independent variable If we can reject H0: βj = 0 at the 0.05 level of significance, we have strong evidence that the independent variable xj is significantly related to y At the 0.01 level of significance, we have very strong evidence The smaller the significance level  at which H0 can be rejected, the stronger the evidence that xj is significantly related to y

A Confidence Interval for the Regression Parameter βj If the regression assumptions hold, 100(1-)% confidence interval for βj is [b1 ± t/2 Sbj] t/2 is based on n – (k + 1) degrees of freedom

14.6 Confidence and Prediction Intervals The point corresponding to a particular value of x01, x02,…, x0k, of the independent variables is ŷ = b0 + b1x01 + b2x02 + … + bkx0k It is unlikely that this value will equal the mean value of y for these x values Need bounds on how far the predicted value might be from the actual value We can do this by calculating a confidence interval for the mean value of y and a prediction interval for an individual value of y

A Confidence Interval and a Prediction Interval

14.7 Using Dummy Variables to Model Qualitative Independent Variables So far, we have only looked at including quantitative data in a regression model However, we may wish to include descriptive qualitative data as well For example, might want to include the gender of respondents We can model the effects of different levels of a qualitative variable by using what are called dummy variables Also known as indicator variables

How to Construct Dummy Variables A dummy variable always has a value of either 0 or 1 For example, to model sales at two locations, would code the first location as a zero and the second as a 1 Operationally, it does not matter which is coded 0 and which is coded 1

What If We Have More Than Two Categories? Consider having three categories, say A, B and C Cannot code this using one dummy variable A = 0, B = 1 and C = 2 would be invalid Assumes the difference between A and B is the same as B and C We must use multiple dummy variables Specifically, k categories requires k - 1 dummy variables

What If We Have More Than Two Categories? Continued For A, B, and C, would need two dummy variables x1 is 1 for A, zero otherwise x2 is 1 for B, zero otherwise If x1 and x2 are zero, must be C This is why the third dummy variable is not needed

Interaction Models So far, have only considered dummy variables as stand-alone variables Model so far is y = β0 + β1x + β2D +  Where D is dummy variable However, can also look at interaction between dummy variable and other variables That model would take the form y = β0 + β1x + β2D + β3xD +  With an interaction term, both the intercept and slope are shifted

14.8 Model Building and the Effects of Multicollinearity Multicollinearity causes problems evaluating the p-values of the model Therefore, we need to evaluate more than the additional importance of each independent variable We also need to evaluate how the variables work together One way to do this is to determine if the overall model gives a high R² and adjusted R², a small s, and short prediction intervals

Effect of Adding Independent Variable Adding any independent variable will increase R² Even adding an unimportant independent variable Thus, R² cannot tell us that adding an independent variable is undesirable

A Better Criterion A better criterion is the size of the standard error s If s increases when an independent variable is added, we should not add that variable However, decreasing s alone is not enough An independent variable should only be included if it reduces s enough to offset the higher t value and reduces the length of the desired prediction interval for y

C Statistic Another quantity for comparing regression models is called the C (a.k.a. Cp) statistic First, calculate mean square error for the model containing all p potential independent variables (s2p) Next, calculate SSE for a reduced model with k independent variables

C Statistic Continued We want the value of C to be small Adding unimportant independent variables will raise the value of C While we want C to be small, we also wish to find a model for which C roughly equals k+1 A model with C substantially greater than k+1 has substantial bias and is undesirable If a model has a small value of C and C for this model is less than k+1, then it is not biased and the model should be considered desirable

14.9 Residual Analysis in Multiple Regression For an observed value of yi, the residual is ei = yi - ŷ = yi – (b0 + b1xi1 + … + bkxik) If the regression assumptions hold, the residuals should look like a random sample from a normal distribution with mean 0 and variance σ2

Residual Plots Residuals versus each independent variable Residuals versus predicted y’s Residuals in time order (if the response is a time series)