Presentation on theme: "Example 12.3 Explaining Spending Amounts at HyTex Include/Exclude Decisions."— Presentation transcript:
Example 12.3 Explaining Spending Amounts at HyTex Include/Exclude Decisions
12.112.1 | 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | 12.512.212.3a12.1a12.412.4a12.5 Objective To see which potential explanatory variables are useful for explaining current year spending amounts at HyTex with multiple regression.
12.112.1 | 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | 12.512.212.3a12.1a12.412.4a12.5 CATALOGS.XLS n This file contains data on 100 customers who purchased mail-order products from the HyTex Company in the current year. n Recall from Example 3.11 that HyTex is a direct marketer of stereo equipment, personal computers, and other electronic products. n HyTex advertises entirely by mailing catalogs to its customers, and all of its orders are taken over the telephone. n We want to estimate and interpret a regression equation for Amount Spent based on all of these variables.
12.112.1 | 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | 12.512.212.3a12.1a12.412.4a12.5 The Data n The company spends a great deal of money on its catalog mailings, and it wants to be sure that this is paying off in sales. n For each customer there are data on the following variables: –Age: age of the customer at the end of the current year –Gender: coded as 1 for males, 0 for females –OwnHome: coded as 1 if customer owns a home, 0 otherwise –Married: coded as 1 if customer is currently married, 0 otherwise
12.112.1 | 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | 12.512.212.3a12.1a12.412.4a12.5 The Data -- continued –Close: coded as 1 if customers lives reasonably close to a shopping area that sells similar merchandise, 2 otherwise –Salary: combined annual salary of customer and spouse (if any) –Children: number of children living with customer –PrevCust: coded as a 1 if customer purchased from HyTex during the previous year, 0 otherwise –PrevSpent: total amount of purchase made from HyTex during the previous year –Catalogs: Number of catalogs sent to the customer this year –AmountSpent: total amount of purchase made from HyTex during this year
12.112.1 | 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | 12.512.212.3a12.1a12.412.4a12.5 The Data -- continued n With this much data, 1000 observations, we can certainly afford to set aside part of the data set for validation. n Although any split could be used, let’s base the regression on the first 250 observations and use the other 750 for validation.
12.112.1 | 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | 12.512.212.3a12.1a12.412.4a12.5 The Regression n We begin by entering all of the potential explanatory variables. n Our goal then is exclude variables that aren’t necessary, based on their t-values and p-values. To do this we follow the Guidelines for Including / Excluding Variables in a Regression Equation. n The regression output with all explanatory variables included is provided on the following slide.
12.112.1 | 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | 12.512.212.3a12.1a12.412.4a12.5 Analysis n This output indicates a fairly good fit. The R 2 value is 79.1% and s e is about $424. n From the p-value column, we see that there are three variables, Age, Own_Home, and Married, that have p-values well above 0.05. n These are the obvious candidates for exclusion. It is often best to exclude one variable at a time starting with the variable with the highest p-value. n The regression output with all insignificant variables excluded is seen in the output on the next slide.
12.112.1 | 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | 12.512.212.3a12.1a12.412.4a12.5 Interpretation of Final Regression Equation n The coefficient of Gender implies that an average male customer spent about $130 less than the average female customer. Similarly, an average customer living close to stores with this type of merchandise spent about $288 less than those customers living far form stores. n The coefficient of Salary implies that, on average, about 1.5 cents of every salary dollar was spent on HyTex merchandise.
12.112.1 | 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | 12.512.212.3a12.1a12.412.4a12.5 Interpretation of Final Regression Equation -- continued n The coefficient of Children implies that $158 less was spent for every extra child living at home. n The PrevCust and PrevSpent terms are somewhat more difficult to interpret. –First, both of these terms are 0 for customers who didn’t purchase from HyTex in the previous year. –For those that did the terms become -724 + 0.47PrevSpent –The coefficient 0.47 implies that each extra dollar spent in the previous year can be expected to contribute an extra 47 cents in the current year.
12.112.1 | 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | 12.512.212.3a12.1a12.412.4a12.5 Interpretation of Final Regression Equation -- continued –The median spender last year spent about $900. So if we substitute this for PrevSpent we obtain -301. –Therefore, this “median” spender from last year can be expected to spend about $301 less this year than the previous year nonspender. n The coefficient of Catalog implies that each extra catalog can be expected to generate about $43 in extra spending.
12.112.1 | 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | 12.512.212.3a12.1a12.412.4a12.5 Cautionary Notes n When we validate this final regression equation with the 750 customers, using the procedure from Section 11.7, we find R 2 and s e values of 71.8% and $522. n These aren’t bad. They show little deterioration from the values based on the original 250 customers. n We haven’t tried all possibilities yet. We haven’t tried nonlinear or interaction variables, nor have we looked at different coding schemes; we haven’t checked for nonconstant error variance or looked at potential effects of outliers.