Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess

Similar presentations


Presentation on theme: "Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess"— Presentation transcript:

1 Cross-Validation vs. Bootstrap Estimates of Prediction Error in Statistical Modeling
Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess Assistant Professor of Statistics:

2 Regression Analysis

3 Regression Analysis To find the regression line for data (xi, yi), minimize

4 Regression Analysis To find the regression line for data (xi, yi), minimize Estimates linear relationships between dependent and independent variables.

5 Regression Analysis To find the regression line for data (xi, yi), minimize Estimates linear relationships between dependent and independent variables. Applications: Prediction and Forecasting.

6 Classical Regression Procedure
Choose a model: y = b0 + b1 x1 + b2 x2 + e . Verify assumptions: normality of the data. Fit the model, checking for significance of parameters. Check the model’s predictive capability.

7 Mean Squared Error of Prediction

8 Mean Squared Error of Prediction
MSEP measures how well a model predicts the response value of a future observation.

9 Mean Squared Error of Prediction
MSEP measures how well a model predicts the response value of a future observation. For our regression model, the MSEP of a new observation yn + 1 is

10 Mean Squared Error of Prediction
MSEP measures how well a model predicts the response value of a future observation. For our regression model, the MSEP of a new observation yn + 1 is Small values of MSEP indicate good predictive capability.

11 What is Cross-Validation?
Divide the data into two sub-samples: Treatment set (to fit the model), Validation set (to assess predictive value). Non-parametric approach: mainly used when normality assumption is not met. Criterion for model’s prediction ability: usually the MSEP statistic.

12 CV For Linear Regression: The “Withhold-1” Algorithm
Use the model: y = b0 + b1 x1 + b2 x2 + e .

13 CV For Linear Regression: The “Withhold-1” Algorithm
Use the model: y = b0 + b1 x1 + b2 x2 + e . Withhold one observation (x1i, x2i, yi).

14 CV For Linear Regression: The “Withhold-1” Algorithm
Use the model: y = b0 + b1 x1 + b2 x2 + e . Withhold one observation (x1i, x2i, yi). Fit the regression model to the remaining n – 1 observations.

15 CV For Linear Regression: The “Withhold-1” Algorithm
Use the model: y = b0 + b1 x1 + b2 x2 + e . Withhold one observation (x1i, x2i, yi). Fit the regression model to the remaining n – 1 observations. For each i, calculate

16 CV For Linear Regression: The “Withhold-1” Algorithm
Use the model: y = b0 + b1 x1 + b2 x2 + e . Withhold one observation (x1i, x2i, yi). Fit the regression model to the remaining n – 1 observations. For each i, calculate Finally, calculate

17 What is the Bootstrap? The Bootstrap is:
A computationally intensive technique, Involves simulation and resampling. Used here to assess the accuracy of statistical estimates for a model: Confidence intervals, Standard errors, Estimate of MSEP.

18 Algorithm For a Bootstrap

19 Algorithm For a Bootstrap
From a data set of size n, randomly draw B samples with replacement, each of size n.

20 Algorithm For a Bootstrap
From a data set of size n, randomly draw B samples with replacement, each of size n. Find the estimate of MSEP for each of the B samples:

21 Algorithm For a Bootstrap
From a data set of size n, randomly draw B samples with replacement, each of size n. Find the estimate of MSEP for each of the B samples: Average these B estimates of q to obtain the overall bootstrap estimate:

22 Schematic Diagram of Bootstrap
Θ(x2*) Θ(x1*) Θ(xB*) Bootstrap Samples X1* X2* XB* Resampling Variablity Data X=(x1, x2, …,xn) Sampling Variablity Population F

23 Application: Heart Measurements on Children
Study: Catheterize 12 children with heart defects and take measurements. Variables measured: y: observed catheter length in cm w: patient’s weight in pounds h: patient’s height in inches Goal: To predict y from w and h. Difficulties: Small n, non-normal data.

24 Model and Fitted Model

25 Model and Fitted Model Model: y = b0 + b1w + b2h + e . Fitted Model:

26 Model and Fitted Model Model: y = b0 + b1w + b2h + e . Fitted Model:
Parameter estimates for the heart data are: b0 estimated as 25.6, b1 estimated as 0.277, b2 term eliminated from model (not useful).

27 Regression Results Both parameters b0 and b1 are significantly different from 0 (important to the model): p-values: (for b0) and (for b1) R2 = 80% (of variation in y explained) Once weight is known, height does not provide additional useful information. Example: For a child weighing 50 lbs., the estimated distance is cm.

28 Comparison of CV and Bootstrap
MSEP Estimates: CV: MSEP = 18.05 Bootstrap: MSEP = (smaller = better) For this example: The Bootstrap has the better prediction capability. In general: CV methods work well for large samples. Bootstrap is effective, even for small samples.

29 Cross-Validation vs. Bootstrap Estimates of Prediction in Statistical Modeling
Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess Assistant Professor of Statistics:


Download ppt "Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess"

Similar presentations


Ads by Google