Multiple Regression Analysis

Slides:



Advertisements
Similar presentations
Korelasi Diri (Auto Correlation) Pertemuan 15 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Advertisements

Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Simple Linear Regression Analysis
Correlation and Linear Regression
Multiple Linear Regression and Correlation Analysis
Multiple Regression and Model Building
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Exam Feb 28: sets 1,2 Set 1 due Thurs Memo C-1 due Feb 14 Free tutoring will be available next week Plan A: MW 4-6PM OR Plan B: TT 2-4PM VOTE for Plan.
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Exam 4 – Optional Times for Final Two options for completing Exam 4 Thursday (12/4/14) – The regularly scheduled time Tuesday (12/9/14) – The optional.
Multiple Regression [ Cross-Sectional Data ]
Chapter 13 Multiple Regression
Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Korelasi Ganda Dan Penambahan Peubah Pertemuan 13 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter 12 Simple Regression
Chapter 12 Multiple Regression
© 2000 Prentice-Hall, Inc. Chap Multiple Regression Models.
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 2000 LIND MASON MARCHAL 1-1 Chapter Twelve Multiple Regression and Correlation Analysis GOALS When.
Multiple Regression and Correlation Analysis
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.
Simple Linear Regression Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
© 2001 Prentice-Hall, Inc.Chap 14-1 BA 201 Lecture 23 Correlation Analysis And Introduction to Multiple Regression (Data)Data.
Correlation and Linear Regression
Correlation and Linear Regression
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Linear Regression and Correlation
Linear Regression and Correlation
Linear Regression and Correlation
Correlation and Linear Regression
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Lecture 14 Multiple Regression Model
Multiple Linear Regression and Correlation Analysis
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Multiple Regression Analysis Multivariate Analysis.
Multiple Regression Analysis
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Chapter 14 Introduction to Multiple Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
14- 1 Chapter Fourteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
14-1 Qualitative Variable - Example Frequently we wish to use nominal-scale variables—such as gender, whether the home has a swimming pool, or whether.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
© 2000 Prentice-Hall, Inc. Chap Chapter 10 Multiple Regression Models Business Statistics A First Course (2nd Edition)
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Multiple Linear Regression and Correlation Analysis Chapter 14.
BNAD 276: Statistical Inference in Management Spring 2016 Green sheets.
Just one quick favor… Please use your phone or laptop Please take just a minute to complete Course Evaluations online….. Check your for a link or.
Correlation and Linear Regression
Chapter 14 Introduction to Multiple Regression
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
Multiple Regression Chapter 14.
Pemeriksaan Sisa dan Data Berpengaruh Pertemuan 17
Chapter Fourteen McGraw-Hill/Irwin
Presentation transcript:

Multiple Regression Analysis Chapter 14 McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.

Topics Multiple Regression Estimation Global Test Individual Coefficient Test Regression Assumptions and Regression Diagnostics Error Term Distribution Multicollinearity Heteroscedascity Autocorrelation Dummy Variable Stepwise Regression 13-2 2

Multiple Regression Analysis Multiple Linear Regression Model: Y = α + β1X1 + β2X2+ ··· +βkXk+ ε Y is the dependent variable and X1, X2, … Xk are the independent variable. α, β1, β2, …, βk are population coefficients that need to be estimated using sample data. ε is the error term. The model represents the linear relationship between the two variables in the population Estimated Regression Equation: = a + b1X1 + b2X2+ + ··· +bkXk a and b1, b2, …, bk are estimated coefficients from the sample. bi is the net change in Y for each unit change in Xi holding other X’s constant. The least squares criterion is used to develop this equation. 14-3

Multiple Linear Regression - Example Salsberry Realty sells homes along the east coast of the United States. One of the questions most frequently asked by prospective buyers is: If we purchase this home, how much can we expect to pay to heat it during the winter? The research department at Salsberry has been asked to develop some guidelines regarding heating costs for single-family homes. Three variables are thought to relate to the heating costs: (1) the mean daily outside temperature, (2) the number of inches of insulation in the attic, and (3) the age in years of the furnace. To investigate, Salsberry’s research department selected a random sample of 20 recently sold homes. It determined the cost to heat each home last January, as well as the January outside temperature in the region, the number of inches of insulation in the attic, and the age of the furnace. X1 X2 X3 Data Salsberry 14-4

Multiple Linear Regression – Excel Output SUMMARY OUTPUT   Regression Statistics  Multiple R 0.896755 R Square 0.80417 Adjusted R Square 0.767452 Standard Error 51.04855 Observations 20 ANOVA df SS MS F Significance F Regression 3 171220.5 57073.49 21.90118 6.56E-06 Residual 16 41695.28 2605.955 Total 19 212915.8 Coefficients t Stat P-value Lower 95% Upper 95% Intercept 427.1938 59.60143 7.167509 2.24E-06 300.8444 553.5432 Temp -4.58266 0.772319 -5.93364 2.1E-05 -6.21991 -2.94542 Insul -14.8309 4.754412 -3.11939 0.006606 -24.9098 -4.75196 Age 6.101032 4.01212 1.52065 0.147862 -2.40428 14.60635 See Excel instruction in the textbook, P 566, #2. b1, b2, and b3, a 14-5

Estimating the Multiple Regression Equation Interpreting the Regression Coefficients The regression coefficient for mean outside temperature, X1, is -4.583. For every unit increase in temperature, holding the other two independent variables constant, monthly heating cost is expected to decrease by $4.583. The attic insulation variable, X2, also shows a negative relationship. For each additional inch of insulation, the cost to heat the home is expected to decline by $14.83 per month, . The age of the furnace variable shows a positive relationship. For each additional year older the furnace is, the cost is expected to increase by $6.10 per month. 14-6

Using the Multiple Regression Equation Applying the Model for Estimation What is the estimated heating cost for a home if the mean outside temperature is 30 degrees, there are 5 inches of insulation in the attic, and the furnace is 10 years old?

Fitness of the model—Adjusted r2 The Adjusted R2 R2 is inflated by the number of independent variables. In multiple regression analysis, the adjusted R2 is a better measurement of the fitness of the model. Ranges from 0 to 1. The Adjusted R2 is adjusted by the number of independent variables and sample size. It measures the percentage of total variation in Y that is explained by all independent variables, that is, explained by the regression model. SUMMARY OUTPUT   Regression Statistics  Multiple R 0.896755 R Square 0.80417 Adjusted R Square 0.767452 Standard Error 51.04855 Observations 20 . About 76.7% of the variation in the heating cost is explained by the mean outside temperature, attic insulation and the age of furnace. 14-8

Global Test: Testing the Multiple Regression Model The global test is used to investigate whether any of the independent variables have coefficients that are significantly different from zero. This is also the test on the validity of the model. The hypotheses are: Decision Rules: (1) Reject H0 if F > F,k,n-k-1 or (2) Reject H0 if p-value<α 14-9

F-distribution The distribution takes nonnegative values only. Asymmetric, skewed to the right. The shape of the distribution is controlled by 2 degrees of freedoms, denoted v1 and v2. The degrees of freedoms are usually reported in the ANOVA table in the output Excel function: =FINV(α, k, n-k-1) ANOVA   df SS MS F Significance F Regression 3 171220.5 57073.49 21.90118 6.56E-06 Residual 16 41695.28 2605.955 Total 19 212915.8

Global test—Example 2. Significance level: α=0.05 ANOVA   df SS MS F Significance F Regression 3 171220.5 57073.49 21.90118 6.56E-06 Residual 16 41695.28 2605.955 Total 19 212915.8 2. Significance level: α=0.05 3. Test statistic: F=21.90 14-11

Global test—Example 4. Rule (1) Rejection region: ANOVA   df SS MS F Significance F Regression 3 171220.5 57073.49 21.90118 6.56E-06 Residual 16 41695.28 2605.955 Total 19 212915.8 4. Rule (1) Rejection region: Reject H0 if F >3.24 According to step 3, F=21.90, which falls in the rejection region. Rule (2) Reject H0 if p-value < α p-value =0.00, less than 0.05 5. Decision: rejection the null hypothesis =FINV(.05, 3, 16) = 3.24 14-12

Interpretation The null hypothesis that all the multiple regression coefficients are zero is rejected. Interpretation: Some of the independent variables are useful in predicting the dependent variable (heating cost). Some of the independent variables are linearly related to the dependent variable. The model is valid. Logical question – which ones? 14-13

Evaluating Individual Regression Coefficients (βi) This test is used to determine which independent variables have nonzero regression coefficients. The variables that have nonzero regression coefficients are said to have significant coefficients (significantly different from zero). The variables that have zero regression coefficients can be dropped from the analysis. The test statistic follows t distribution. The test hypotheses test are: H0: βi = 0 H1: βi ≠ 0 Instead of comparing test statistic with rejection region for each independent variable (which is tedious), we rely on the p-values. If p-value < α, we reject the null hypothesis. 14-14

P-values for the Slopes For temperature: For Insulation: For furnace age: H0: β1 = 0 H0: β2 = 0 H0: β3 = 0 H1: β1 ≠ 0 H1: β2 ≠ 0 H1: β3 ≠ 0 P-value=.00 < .05 P-value=.007 < .05 P-value=.148 < .05 Conclusions: For temperature and insulation, rejection the null hypothesis. The coefficients are significant (significantly different from zero); The variables are linearly related to heating cost The variables are useful in predicting heating cost For furnace age, do not rejection the null hypothesis. the coefficient is insignificant and thus can be dropped from the model The variable is not linearly related to heating cost The variable is not useful in predicting heating cost   Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 427.1938 59.60143 7.167509 2.24E-06 300.8444 553.5432 Temp -4.58266 0.772319 -5.93364 2.1E-05 -6.21991 -2.94542 Insul -14.8309 4.754412 -3.11939 0.006606 -24.9098 -4.75196 Age 6.101032 4.01212 1.52065 0.147862 -2.40428 14.60635 14-15

New Regression without Variable “Age” SUMMARY OUTPUT Regression Statistics Multiple R 0.880834 R Square 0.775868 Adjusted R Square 0.7495 Standard Error 52.98237 Observations 20 ANOVA   df SS MS F Significance F Regression 2 165194.5 82597.26 29.42408 3.01E-06 Residual 17 47721.23 2807.131 Total 19 212915.8 Coefficients t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 490.2859 44.40984 11.04003 3.56E-09 396.5893 583.9825 Temp -5.14988 0.701887 -7.3372 1.16E-06 -6.63074 -3.66903 Insul -14.7181 4.933918 -2.98305 0.008351 -25.1278 -4.30849 14-16

New Regression Model without Variable “Age” ANOVA   df SS MS F Significance F Regression 2 165194.5 82597.26 29.42408 3.01E-06 Residual 17 47721.23 2807.131 Total 19 212915.8 d.f. (2,17) 3.59 2. Significance level: α=0.05 3. Test statistic: F=29.42 4. Rejection region: Reject H0 if F >3.59, test statistic falls in the rejection region. p-value =0.00, less than 0.05 5. Decision: rejection the null hypothesis

Individual t-test on the new Coefficient For temperature: For Insulation: H0: β1 = 0 H0: β2 = 0 H1: β1 ≠ 0 H1: β2 ≠ 0 P-value=.00 < .05 P-value=.008 < .05 Conclusions: For temperature and insulation, rejection the null hypothesis. The coefficients are significant (significantly different from zero); The variables are linearly related to heating cost The variables are useful in predicting heating cost   Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 490.2859 44.40984 11.04003 3.56E-09 396.5893 583.9825 Temp -5.14988 0.701887 -7.3372 1.16E-06 -6.63074 -3.66903 Insul -14.7181 4.933918 -2.98305 0.008351 -25.1278 -4.30849 14-18

Multiple Regression Assumptions Each of the independent variables and the dependent variable have a linear relationship. The independent variables are not correlated. When this assumption is violated, we call the condition multicollinearity. The probability distribution of ε is normal. The variance of ε is constant regardless of the value of . This condition is called homoscedasticity. When the requirement is violated, we say heterscedasticity is observed in the regression. The error terms are independent of each other. This assumption is often violated when time is involved and we call this condition autocorrelation. 14-19

Evaluating the Assumptions of Multiple Regression There is a linear relationship. We use scatter plot to examine this assumption. The independent variables are not correlated. We examine the correlation coefficient among the independent variables. The error term follow the normal probability distribution. We use histogram of the residual or normal probability plot to examine the normality. The variance of ε is constant regardless of the value of . The error term independent of each other. We plot the residual against the predicted Y to examine the last two assumptions. 14-20

Assumption I: linear relationship A scatter plot of each independent variable against the dependent variable is used. In practice, we can skip this check since the test on individual coefficient will serve the same purpose. 14-21

Assumption II: Multicollinearity Multicollinearity exists when independent variables (X’s) are correlated. Effects of Multicollinearity on the Model: 1. An independent variable known to be an important predictor ends up having an insignificant coefficient. 2. A regression coefficient that should have a positive sign turns out to be negative, or vice versa. 3. Multicollieanrity adds difficulty to the interpretation of the coefficients. When one variable changes by 1 unit, other correlated variables will change also (but we require it to be held constant in order to correctly interpret the coefficient). However, correlated independent variables do not affect a multiple regression equation’s ability to predict the dependent variable (Y). Minimizing the effect of multicollinearity is often easier than correcting it: Try to include explanatory variables that are independent of each other. Remove variables that cause multicollinearity in the model. 14-22

Multicollinearity: Detection A general rule is if the correlation between two independent variables is between -0.70 and 0.70 there likely is not a problem using both of the independent variables. A more precise test is to use the variance inflation factor (VIF). A VIF > 10 is unsatisfactory. Remove that independent variable from the analysis. The value of VIF is found as follows: The term R2j refers to the coefficient of determination, where the selected independent variable is used as a dependent variable and the remaining independent variables are used as independent variables. 14-23

Multicollinearity – Example Refer to example of heating cost, which is related to the independent variables outside temperature, amount of insulation, and age of furnace. Develop a correlation matrix for all the independent variables. Does it appear there is a problem with multicollinearity? Correlation Matrix   Temp Insul Age 1.00 -0.10 -0.49 0.06 None of the correlations (highlighted above) among the independent variables exceed -.7u0 or .70, so we do not suspect problems with multicollinearity. Excel: Data-> Data Analysis-> Correlation 14-24

Regression Statistics VIF – Example SUMMARY OUTPUT Regression Statistics Multiple R 0.491328 R Square 0.241403 Adjusted R Square 0.152157 Standard Error 16.03105 Observations 20   Coefficients t Stat P-value Intercept 57.99449 12.34827 4.696567 0.000208 Insul -0.50888 1.487944 -0.342 0.736541 Age -2.50902 1.103252 -2.2742 0.036201 Find and interpret the variance inflation factor for each of the independent variables. We consider variable temperature first. We run a multiple regression with temperature as the dependent variable and the other two as the independent variables. Coefficient of Determination . The VIF value of 1.32 is less than the upper limit of 10. This indicates that the independent variable temperature is not strongly correlated with the other independent variables. 14-25

VIF – Example Calculating the VIF for each variable using Excel can be tedious. Minitab generates the VIF values for each independent variable in its output, which is shown below. . None of the VIFs are higher than 10. hence, we conclude there is not a problem with multicollinearity in this example. Note: for your project first obtain correlation matrix. For variables that are associated with correlation coefficients exceeding -.70 or .70, calculated the corresponding VIFs to further determine whether multicollinearity is an issue or not. 14-26

Assumption III: Normality of Error Term Histogram (discuss in review) of residuals is used to visually determine whether the assumption of normality is satisfied. Excel offers another graph, normal probability plot, that helps to evaluate this assumption. Basically, if the plotted points are fairly close to a straight line drawn from the lower left to the upper right, the normality assumption is satisfied. . 14-27

Assumption IV & V As we can see from the scatter plot, the residuals are randomly distributed across the horizontal axis and there is no obvious. Therefore, there is no sign of heteroscedasticity or autocorrelation. 14-28

Residual Plot versus Fitted Values: Testing the Heteroscedasticity Assumption When the variance of the error term is changing across different values of Y’s, we refer to this condition as heteroscedasticity. In the plot of the residuals against the predicted value of Y, we look for a change in the spread of the plotted points. The spread of the points increases as the predicted value of Y increases. A scatter plot such as this would indicate possible heteroscedasticity. 14-29

Residual Plot versus Fitted Values: Testing the Independence Assumption When successive residuals are correlated we refer to this condition as autocorrelation, which frequently occurs when the data are collected over a period of time. Note the run of residuals above the mean of the residuals, followed by a run below the mean. A scatter plot such as this would indicate possible autocorrelation. 14-30

Dummy Variable Notation: Usually categorical data or nominal data cannot be included in the analysis directly. Instead, we need to use dummy variables to denote the categories. Dummy variable: Dummy variable is a variable that can assume either one of only two values (usually 1 and 0), where 1 represents the existence of a certain condition and 0 indicates that the condition does not hold. Notation:

Dummy Variable - Example Suppose in the Salsberry Realty example that the independent variable “garage” is added, which indicate whether a house comes with an attached garage or not. To include this variable in our analysis, we define a dummy variable as follows: for those homes without an attached garage, 0 is used; for homes with an attached garage, a 1 is used. 14-32

Dummy Variable - Example SUMMARY OUTPUT Regression Statistics Multiple R 0.932651 R Square 0.869838 Adjusted R Square 0.845433 Standard Error 41.61842 Observations 20 ANOVA   df SS MS F Significance F Regression 3 185202.3 61734.09 35.64133 2.59E-07 Residual 16 27713.48 1732.093 Total 19 212915.8 Coefficients t Stat P-value Lower 95% Upper 95% Intercept 393.6657 45.00128 8.747876 1.71E-07 298.2672 489.0641 Temp -3.96285 0.652657 -6.07186 1.62E-05 -5.34642 -2.57928 Insul -11.334 4.001531 -2.8324 0.01201 -19.8168 -2.85109 Garage 77.4321 22.78282 3.398706 0.00367 29.13468 125.7295 New estimated regression equation: 14-33

Dummy Variable - Example Interpretation: b3 = 77.4: the heating cost for homes with attached garage is on average $77.4 higher than homes without attached garage, with other conditions being the same. 14-34

Dummy Variable – Another Example What determines the value of a used car? To examine this issue, a used-car dealer randomly selected 100 3-year-old Toyota Camrys that were sold at auction during the past month. Each car was in top condition and equipped with all the features that come standard with this car. The dealer recorded the price ($1,000), the number of miles (thousands) on the odometer and the color of the car. When recording the color, the dealer uses 1 to denote white, 2 to denote silver and 3 to denote other colors. 14-35

Dummy Variable – Another Example Although variable color include numbers, 1, 2, and 3, they cannot be included in the analysis. Instead we need to generate dummy variables to denote the different categories. Rule of assigning dummy variables: if there are m different categories in the data, generate m-1 dummy variables. The last category is represented by I1 = I2 = … = Im-1 = 0 , and is called the omitted category. Since there are three categories in variable color, we generate two dummy variables defined as follows: “Other colors” is the omitted category and is represented by I1 = I2 = 0 14-36

Dummy Variable – Excel Open data Toyota Camry In the column next to “color” type “I1” to generate the dummy variable for “white.” In the cell below it, type =IF(C2=1, 1, 0) and hit enter. (Excel function: IF(logical_test, [value_if_true], [value_if_false].) Copy the cell and paste to the rest of cells in the column till the cell in the previous column is empty. Similarly, generate the dummy variable for “silver” in the next column by typing =IF(C2=2, 1, 0) and follow the same procedure. To run regression, we need to put the explanation variables together. Copy the column of Odometer and past to the column next to the second dummy variable. Run multiple regression using the 2 dummy variables and Odometer. 14-37

Dummy Variable – Excel =IF(C2=1, 1, 0) =IF(C2=2, 1, 0) 14-38

Regression Statistics Dummy Variable – Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.837135 R Square 0.700794 Adjusted R Square 0.691444 Standard Error 0.304258 Observations 100 ANOVA   df SS MS F Significance F Regression 3 20.81492 6.938306 74.9498 4.65E-25 Residual 96 8.886981 0.092573 Total 99 29.7019 Coefficients t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 16.83725 0.197105 85.42255 2.28E-92 16.446 17.2285 I1 0.091131 0.072892 1.250224 0.214257 -0.05356 0.235819 I2 0.330368 0.08165 4.046157 0.000105 0.168294 0.492442 Odometer -0.05912 0.005065 -11.6722 4.04E-20 -0.06918 -0.04907 Estimated regression equation: 14-39

Dummy Variable – Interpretation The coefficient of I1 : b1 = 0.09: A white Camry sells for .0911 thousand or $91.10 on average more than other colors (nonwhite, nonsilver) with the same odometer reading. The coefficient of I2 : b2 = 0.3304: A silver Camry sells for .3304 thousand or $33.04 on average more than other colors (nonwhite, nonsilver) with the same odometer reading. 14-40

Stepwise Regression The advantages to the stepwise method are: 1. Only independent variables with significant regression coefficients are entered into the equation. 2. The steps involved in building the regression equation are clear. 3. It is efficient in finding the regression equation with only significant regression coefficients. 4. The changes in the multiple standard error of estimate and the coefficient of determination are shown. 14-41

Stepwise Regression – Minitab Example The stepwise MINITAB output for the heating cost problem follows. Temperature is selected first. This variable explains more of the variation in heating cost than any other proposed independent variables. Garage is selected next, followed by Insulation. Variable age is not selected 14-42