Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 23 Multiple Regression.

Slides:



Advertisements
Similar presentations
Forecasting Using the Simple Linear Regression Model and Correlation
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Copyright © 2010 Pearson Education, Inc. Slide
Inference for Regression
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Analysis of Economic Data
Chapter 13 Multiple Regression
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Chapter 12 Multiple Regression
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
The Simple Regression Model
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Linear Regression and Correlation Analysis
Lecture 24: Thurs., April 8th
Lecture 23 Multiple Regression (Sections )
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Ch. 14: The Multiple Regression Model building
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Active Learning Lecture Slides
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
Inference for regression - Simple linear regression
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Hypothesis Testing in Linear Regression Analysis
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Copyright © 2011 Pearson Education, Inc. Regression Diagnostics Chapter 22.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics.
Chapter 14 Introduction to Multiple Regression
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Introduction to Linear Regression
Copyright © 2011 Pearson Education, Inc. Analysis of Variance Chapter 26.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 19 Linear Patterns.
Copyright © 2011 Pearson Education, Inc. The Simple Regression Model Chapter 21.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 Multiple Regression
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 21 The Simple Regression Model.
Lecture 10: Correlation and Regression Model.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Correlation & Regression Analysis
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Copyright © 2011 Pearson Education, Inc. Regression Diagnostics Chapter 22.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Active Learning Lecture Slides For use with Classroom Response Systems Chapter 23 Multiple Regression.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 24 Building Regression Models.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 26 Analysis of Variance.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Elementary Statistics
Product moment correlation
Presentation transcript:

Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 23 Multiple Regression

Copyright © 2014, 2011 Pearson Education, Inc The Multiple Regression Model A chain is considering where to locate a new restaurant. Is it better to locate it far from the competition or in a more affluent area?  Use multiple regression to describe the relationship between several explanatory variables and the response.  Multiple regression separates the effects of each explanatory variable on the response and reveals which really matter.

Copyright © 2014, 2011 Pearson Education, Inc The Multiple Regression Model  Multiple regression model (MRM): model for the association in the population between multiple explanatory variables and a response.  k: the number of explanatory variables in the multiple regression (k = 1 in simple regression).

Copyright © 2014, 2011 Pearson Education, Inc The Multiple Regression Model The response Y is linearly related to k explanatory variables X 1, X 2, and X k by the equation The unobserved errors in the model 1. are independent of one another, 2. have equal variance, and 3. are normally distributed around the regression equation.

Copyright © 2014, 2011 Pearson Education, Inc The Multiple Regression Model  While the SRM bundles all but one explanatory variable into the error term, multiple regression allows for the inclusion of several variables in the model.  In the MRM, residuals departing from normality may suggest that an important explanatory variable has been omitted.

Copyright © 2014, 2011 Pearson Education, Inc Interpreting Multiple Regression Example: Women’s Apparel Stores  Response variable: sales at stores in a chain of women’s apparel (annually in dollars per square foot of retail space).  Two explanatory variables: median household income in the area (thousands of dollars) and number of competing apparel stores in the same mall.

Copyright © 2014, 2011 Pearson Education, Inc Interpreting Multiple Regression Example: Women’s Apparel Stores  Begin with a scatterplot matrix, a table of scatterplots arranged as in a correlation matrix.  Using a scatterplot matrix to understand data can save considerable time later when interpreting the multiple regression results.

Copyright © 2014, 2011 Pearson Education, Inc Interpreting Multiple Regression Scatterplot Matrix: Women’s Apparel Stores

Copyright © 2014, 2011 Pearson Education, Inc Interpreting Multiple Regression Example: Women’s Apparel Stores The scatterplot matrix for this example  Confirms a positive linear association between sales and median household income.  Shows a weak association between sales and number of competitors.

Copyright © 2014, 2011 Pearson Education, Inc Interpreting Multiple Regression Correlation Matrix: Women’s Apparel Stores

Copyright © 2014, 2011 Pearson Education, Inc Interpreting Multiple Regression R-squared and s e  The equation of the fitted model for estimating sales in the women’s apparel stores example is = Income Competitors

Copyright © 2014, 2011 Pearson Education, Inc Interpreting Multiple Regression R-squared and s e  R 2 indicates that the fitted equation explains 59.47% of the store-to-store variation in sales.  For this example, R 2 is larger than the r 2 values for separate SRMs fitted for each explanatory variable; it is also larger than their sum.  For this example, s e = $68.03.

Copyright © 2014, 2011 Pearson Education, Inc Interpreting Multiple Regression R-squared and s e  is known as the adjusted R-squared. It adjusts for both sample size n and model size k. It is always smaller than R 2.  The residual degrees of freedom (n-k-1) is the divisor of s e. and s e move in opposite directions when an explanatory variable is added to the model ( goes up while s e goes down).

Copyright © 2014, 2011 Pearson Education, Inc. 14 Calibration Plot  Calibration plot: scatterplot of the response on the fitted values.  R 2 is the correlation between and ; the tighter data cluster along the diagonal line in the calibration plot, the larger the R 2 value Interpreting Multiple Regression

Copyright © 2014, 2011 Pearson Education, Inc. 15 Calibration Plot: Women’s Apparel Stores 23.2 Interpreting Multiple Regression

Copyright © 2014, 2011 Pearson Education, Inc Interpreting Multiple Regression Marginal and Partial Slopes  Partial slope: slope of an explanatory variable in a multiple regression that statistically excludes the effects of other explanatory variables.  Marginal slope: slope of an explanatory variable in a simple regression.

Copyright © 2014, 2011 Pearson Education, Inc Interpreting Multiple Regression Partial Slopes: Women’s Apparel Stores

Copyright © 2014, 2011 Pearson Education, Inc Interpreting Multiple Regression Partial Slopes: Women’s Apparel Stores  The slope b 1 = for Income implies that a store in a location with a higher median household of $10,000 sells, on average, $79.66 more per square foot than a store in a less affluent location with the same number of competitors.  The slope b 2 = implies that, among stores in equally affluent locations, each additional competitor lowers average sales by $ per square foot.

Copyright © 2014, 2011 Pearson Education, Inc Interpreting Multiple Regression Marginal and Partial Slopes  Partial and marginal slopes only agree when the explanatory variables are uncorrelated.  In this example they do not agree. For instance, the marginal slope for Competitors is It is positive because more affluent locations tend to draw more competitors. The MRM separates these effects but the SRM does not.

Copyright © 2014, 2011 Pearson Education, Inc Interpreting Multiple Regression Path Diagram  Path diagram: schematic drawing of the relationships among the explanatory variables and the response.  Collinearity: very high correlations among the explanatory variables that make the estimates in multiple regression uninterpretable.

Copyright © 2014, 2011 Pearson Education, Inc Interpreting Multiple Regression Path Diagram: Women’s Apparel Stores Income has a direct positive effect on sales and an indirect negative effect on sales via the number of competitors.

Copyright © 2014, 2011 Pearson Education, Inc Checking Conditions Conditions for Inference Use the residuals from the fitted MRM to check that the errors in the model  are independent;  have equal variance; and  follow a normal distribution.

Copyright © 2014, 2011 Pearson Education, Inc Checking Conditions Residual Plots  Plot of residuals versus fitted y values is not only used to identify outliers and to check the similar variances condition, but to check for dependence that can arise from omitting an important variable from the model.  Plot of residuals versus each explanatory variable are used to verify that the relationships are linear.

Copyright © 2014, 2011 Pearson Education, Inc Checking Conditions Residual Plot: Women’s Apparel Stores Visually estimate that s e is less than $75 per sq ft (all residuals lie within $150 of horizontal line). No evident pattern.

Copyright © 2014, 2011 Pearson Education, Inc Checking Conditions Residual Plot: Women’s Apparel Stores This plot of residuals versus Income does not indicate a problem.

Copyright © 2014, 2011 Pearson Education, Inc Checking Conditions Residual Plot: Women’s Apparel Stores This plot of residuals versus Competitors does not indicate a problem.

Copyright © 2014, 2011 Pearson Education, Inc Checking Conditions Check Normality: Women’s Apparel Stores The quantile plot indicates nearly normal condition is satisfied (although a slight skew is evident in histogram).

Copyright © 2014, 2011 Pearson Education, Inc Inference in Multiple Regression Inference for the Model: F-test  F-test: test of the explanatory power of the MRM as a whole.  F-statistic: ratio of the sample variance of the fitted values to the variance of the residuals.

Copyright © 2014, 2011 Pearson Education, Inc Inference in Multiple Regression Inference for the Model: F-test The F-Statistic is used to test the null hypothesis that all slopes are equal to zero, e.g., H 0 :.

Copyright © 2014, 2011 Pearson Education, Inc Inference in Multiple Regression F-test Results in Analysis of Variance Table The F-statistic has a p-value of <0.0001; reject H 0. Income and Competitors together explain statistically significant variation in sales.

Copyright © 2014, 2011 Pearson Education, Inc Inference in Multiple Regression Inference for One Coefficient  The t-statistic is used to test each slope using the null hypothesis H 0 : β j = 0.  The t-statistic is calculated as

Copyright © 2014, 2011 Pearson Education, Inc Inference in Multiple Regression t-test Results for Women’s Apparel Stores The t-statistics and associated p-values indicate that both slopes are significantly different from zero.

Copyright © 2014, 2011 Pearson Education, Inc Inference in Multiple Regression Confidence Interval Results Consistent with t-statistics results.

Copyright © 2014, 2011 Pearson Education, Inc Inference in Multiple Regression Prediction Intervals  An approximate 95% prediction interval is given by.  For example, the 95% prediction interval for sales per square foot at a location with median income of $70,000 and 3 competitors is approximately $ ± $ per square foot.

Copyright © 2014, 2011 Pearson Education, Inc Steps in Fitting a Multiple Regression 1. What is the problem to be solved? Do these data help in solving it? 2. Check the scatterplots of the response versus each explanatory variable (scatterplot matrix). 3. If the scatterplots appear straight enough, fit the multiple regression model. Otherwise find a transformation.

Copyright © 2014, 2011 Pearson Education, Inc Steps in Fitting a Multiple Regression (Continued) 4. Obtain the residuals and fitted values from the regression. 5. Make scatterplots that show the overall model. Use plot of e vs. to check for similar variances. 6. Check residuals for dependence. 7. Scatterplot residuals versus individual explanatory variables. Look for patterns.

Copyright © 2014, 2011 Pearson Education, Inc Steps in Fitting a Multiple Regression (Continued) 8. Check whether residuals are nearly normal. 9. Use the F-statistic to test the null hypothesis that the collection of explanatory variables has no effect on the response. 10. If the F-statistic is statistically significant, test and interpret individual partial slopes.

Copyright © 2014, 2011 Pearson Education, Inc. 38 4M Example 23.1: SUBPRIME MORTGAGES Motivation A banking regulator would like to verify how lenders use credit scores to determine the interest rate paid by subprime borrowers. The regulator would like to isolate the effect of credit score from other variables that might affect interest rate such as loan-to-value (LTV) ratio, income of the borrower and value of the home.

Copyright © 2014, 2011 Pearson Education, Inc. 39 4M Example 23.1: SUBPRIME MORTGAGES Method Use multiple regression on data obtained for 372 mortgages from a credit bureau. The explanatory variables are the LTV, credit score, income of the borrower, and home value. The response is the annual percentage rate of interest on the loan (APR).

Copyright © 2014, 2011 Pearson Education, Inc. 40 4M Example 23.1: SUBPRIME MORTGAGES Method Find correlations among variables:

Copyright © 2014, 2011 Pearson Education, Inc. 41 4M Example 23.1: SUBPRIME MORTGAGES Method Check scatterplot matrix (like APR vs. LTV ) Linearity and no obvious lurking variables conditions satisfied.

Copyright © 2014, 2011 Pearson Education, Inc. 42 4M Example 23.1: SUBPRIME MORTGAGES Mechanics Fit model and check conditions.

Copyright © 2014, 2011 Pearson Education, Inc. 43 4M Example 23.1: SUBPRIME MORTGAGES Mechanics Residuals versus fitted values. Similar variances condition is satisfied.

Copyright © 2014, 2011 Pearson Education, Inc. 44 4M Example 23.1: SUBPRIME MORTGAGES Mechanics Nearly normal condition is not satisfied; data are skewed. Using Central Limit Theorem is justified.

Copyright © 2014, 2011 Pearson Education, Inc. 45 4M Example 23.1: SUBPRIME MORTGAGES Message Regression analysis shows that the characteristics of the borrower (credit score) and loan LTV affect interest rates in the market. These two factors together explain almost half of the variation in interest rates. Neither income of the borrower nor the home value improves a model with these two variables.

Copyright © 2014, 2011 Pearson Education, Inc. 46 Best Practices  Know the context of your model.  Examine plots of the overall model and individual explanatory variables before interpreting the output.  Check the overall F-statistic before looking at the t-statistics.

Copyright © 2014, 2011 Pearson Education, Inc. 47 Best Practices (Continued)  Distinguish marginal from partial slopes.  Let your software compute prediction intervals in multiple regression.

Copyright © 2014, 2011 Pearson Education, Inc. 48 Pitfalls  Don’t confuse a multiple regression with several simple regressions.  Do not become impatient.  Don’t believe that you have all of the important variables.

Copyright © 2014, 2011 Pearson Education, Inc. 49 Pitfalls (Continued)  Do not think that you have found causal effects.  Do not interpret an insignificant t-statistic to mean that an explanatory variable has no effect.  Don’t think that the order of the explanatory variables in a regression matters.