 # Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).

## Presentation on theme: "Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe)."— Presentation transcript:

Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).

Example: Predicting Emergency Calls to the AAA Club The AAA club of New York provides Emergency Road Service (ERS) to its members. This service is especially useful in the winter months, when people can be stranded with frozen locks, dead batteries, weather induced accidents and spinning tires. If the weather is very bad, the club can be overwhelmed with calls. By tracking the weather conditions the club can divert resources from other club activities to the ERS for projected peak days. In order to be able to allocate its resources efficiently, the club would like to be able to predict ERS calls from the weather forecast on the previous day.

Data The club has available for 28 past days in the winter, –the number of ERS calls to New York AAA offices –the forecasted average temperature ([forecast high + forecast low])/2 –the range of the forecasted temperatures (forecast high – forecast low) –whether rain is forecast (0 if no rain in forecast, 1 if rain in forecast) –whether snow is forecast (0 if no snow in forecast, 1 if snow in forecast) –whether the day is a weekday (1 if M, T, W, Th, F, 0 if Sat or Sun) –whether the day is a Sunday (1 if Sunday, 0 if not) –whether a subzero temperature is forecast (1 if subzero temperature forecast, 0 if not) Source: New York Motorist, March, 1994

Simple Linear Regression using Average Forecast Temperature Root mean square error = 2081.34 calls. Can we do better by using more variables than just average forecast temperature to predict the calls?

Multiple Linear Regression Model Model for the distribution of Y for the subpopulation of units with explanatory variables Multiple Linear regression model: – –The distribution of is normal with mean and SD –Observations are independent.

Multiple Linear Regression in JMP Analyze, Fit Model Put response variable in Y Click on explanatory variables and then click Add under Construct Model Effects Click Run Model.

Root mean square error = 1735.15 calls for multiple regression, compared to 2081.34 calls for simple linear regression on average temperature.

Making Predictions Suppose we want to estimate the average number of calls for New York AAA offices for a day when –The average temperature is predicted to be 20. –The temperature range is predicted to be 10 degrees. –No rain is in the forecast. –Snow is in the forecast –It is a weekday (so weekday=1, Sunday=0) –The temperature is not predicted to be subzero. The estimated mean number of calls for days with these properties is

Residuals and Root Mean Square Errors Residual for observation i = prediction error for observation i = Root mean square error = Typical size of absolute value of prediction error As with simple linear regression model, if multiple linear regression model holds –About 68% of the observations will be within one RMSE of their predicted value –About 95% of the observations will be within two RMSEs of their predicted value –About 99% of the observations will be within three RMSEs of their predicted value For New York AAA data, about 95% of the time, the actual number of ERS calls will be within 2*1735.15=3470.30 of the predicted number of calls based on the multiple linear regression of calls on average forecast temperature, forecasted range of temperature, rain forecast, snow forecast, weekday, Sunday and subzero.

Regression Coefficients Interpretation of coefficient:. An increase in one degree in the average temperature is associated with a decrease of 35.63 AAA calls, on average, assuming that all other variables are held constant (e.g., assuming nothing else changes).

Inferences About Regression Coefficients t-test for regression coefficient j tests versus This answers the question, is variable useful for predicting Y when the other variables (e.g., the other X’s) are already included in the model. If is not rejected (p-value >0.05), it means that we don’t need to include in the model if we have all the other X’s in the model (either Xj is not useful in predicting Y or it is redundant) Range of plausible values for = 95% confidence interval for = For New York AAA data, p-value for t-test that average temperature coefficient equals 0 is 0.4972. average temperature is not a useful predictor once we have already taken into account range, rain, snow, weekday, Sunday, subzero. 95% confidence interval for =

R-Squared R-squared: As in simple linear regression, measures proportion of variability in Y explained by the regression of Y on these X’s. Between 0 and 1, nearer to 1 indicates more variability explained.

Overall F-test Test of whether any of the predictors are useful: vs. at least one of does not equal zero. Tests whether the model provides better predictions than the sample mean of Y. p-value for the test: Prob>F in Analysis of Variance table. p-value = 0.005, strong evidence that at least one of the predictors is useful for predicting ERS for the New York AAA club.

Prediction Intervals and CIs for Mean Response Approximate 95% prediction interval for observation with : Exact 95% prediction interval from JMP that takes into account uncertainty in estimates of regression coefficients: –Create a row with but no Y. –After Fit Model, click red triangle next to Response, click Save Columns and click Indiv Confid Intervals. Saves columns with lower and upper bounds of 95% confidence intervals. 95% confidence interval for the mean response in JMP: Follow same procedure as for 95% prediction interval, but when you click Save Columns, click Mean Confid Intervals instead.

CIs for Mean Response and Prediction Intervals for AAA data For –The average temperature is predicted to be 20. –The temperature range is predicted to be 10 degrees. –No rain is in the forecast. –Snow is in the forecast –It is a weekday –The temperature is not predicted to be subzero 95% Confidence Interval for the mean response: (-29.46, 6419.26) 95% prediction interval (-1652.47, 8042.27)

Next class: –Checking the simple linear regression model. –More on interpretation of multiple regression coefficients. Multiple regression as a method for controlling for known lurking variables. –Hand out on final project. –Hand out HW6, due next Thursday.

Download ppt "Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe)."

Similar presentations