# Stat 112: Lecture 8 Notes Homework 2: Due on Thursday Assessing Quality of Prediction (Chapter 3.5.3) Comparing Two Regression Models (Chapter 4.4) Prediction.

## Presentation on theme: "Stat 112: Lecture 8 Notes Homework 2: Due on Thursday Assessing Quality of Prediction (Chapter 3.5.3) Comparing Two Regression Models (Chapter 4.4) Prediction."— Presentation transcript:

Stat 112: Lecture 8 Notes Homework 2: Due on Thursday Assessing Quality of Prediction (Chapter 3.5.3) Comparing Two Regression Models (Chapter 4.4) Prediction Intervals for Multiple Regression (Chapter 4.5)

Assessing Quality of Prediction (Chapter 3.5.3) R squared is a measure of a fit of the regression to the sample data. It is not generally considered an adequate measure of the regression’s ability to predict the responses for new observations. One method of assessing the ability of the regression to predict the responses for new observations is data splitting. We split the data into a two groups – a training sample and a holdout sample (also called a validation sample). We fit the regression model to the training sample and then assess the quality of predictions of the regression model to the holdout sample.

Measuring Quality of Predictions

Comparing Two Regression Models Multiple Regression Model for automobile data: We use t test to test if one variable, for example, cargo is useful after putting the rest of the three variables into the model. How to test whether cargo and/or seating are useful predictors once weight and hp are taken into account, i.e., test

Full vs. Reduced Model General setup for testing whether any of the variables are useful for predicting y after taking into account variables Full model: Reduced model: Is the full model better than the reduced model?

Partial F test Test statistic: Under H 0, F has an distribution. Round both degrees of freedom down when using Table B.4. Decision rule for test with significance level –Reject H 0 if –Accept H 0 if p-value = Prob (F (K-L, n-K-1) >F)

Cargo and Seating are not useful

Automobile Example Test whether cargo and seating are useful predictors once hp and weight are taken into account. From Table B.4, F(.05; 2,60)=3.15. Because 10.49>3.15, we reject H 0. There is evidence that cargo and/or seating are useful predictors once hp and weight are taken into account.

Test of Usefulness of Model Are any of the variables useful for predicting y? Multiple Linear Regression model:

F Test of Usefulness of Model Under, F has F(K,n-K-1) distribution. Decision rule: Reject if [see Appendix B.3-B.5] F test in JMP in Analysis of Variance table. Prob>F is the p-value for the F test.

Prediction in Automobile Example The design team is planning a new car with the following characteristics: horsepower = 200, weight = 4000 lb, cargo = 18 ft 3, seating = 5 adults. What is a 95% prediction interval for the GPM1000 of this car?

Prediction with Multiple Regression Equation Prediction interval for individual with x 1,…,x K :

Finding Prediction Interval in JMP Enter a line with the independent variables x 1,…,x K for the new individual. Do not enter a y for the new individual. Fit the model. Because the new individual does not have a y, JMP will not include the new individual when calculating the least squares fit. Click red triangle next to response, click Save Columns: –To find, click Predicted Values. Creates column with –To find 95% PI, click Indiv Confid Interval. Creates column with lower and upper endpoints of 95% PI.

Prediction in Automobile Example The design team is planning a new car with the following characteristics: horsepower = 200, weight = 4000 lb, cargo = 18 ft 3, seating = 5 adults. From JMP, – –95% prediction interval: (37.86, 52.31)

Download ppt "Stat 112: Lecture 8 Notes Homework 2: Due on Thursday Assessing Quality of Prediction (Chapter 3.5.3) Comparing Two Regression Models (Chapter 4.4) Prediction."

Similar presentations