Download presentation
Presentation is loading. Please wait.
Published byUrsula Flynn Modified over 6 years ago
1
Cross-Validation vs. Bootstrap Estimates of Prediction Error in Statistical Modeling
Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess Assistant Professor of Statistics:
2
Regression Analysis
3
Regression Analysis To find the regression line for data (xi, yi), minimize
4
Regression Analysis To find the regression line for data (xi, yi), minimize Estimates linear relationships between dependent and independent variables.
5
Regression Analysis To find the regression line for data (xi, yi), minimize Estimates linear relationships between dependent and independent variables. Applications: Prediction and Forecasting.
6
Classical Regression Procedure
Choose a model: y = b0 + b1 x1 + b2 x2 + e . Verify assumptions: normality of the data. Fit the model, checking for significance of parameters. Check the model’s predictive capability.
7
Mean Squared Error of Prediction
8
Mean Squared Error of Prediction
MSEP measures how well a model predicts the response value of a future observation.
9
Mean Squared Error of Prediction
MSEP measures how well a model predicts the response value of a future observation. For our regression model, the MSEP of a new observation yn + 1 is
10
Mean Squared Error of Prediction
MSEP measures how well a model predicts the response value of a future observation. For our regression model, the MSEP of a new observation yn + 1 is Small values of MSEP indicate good predictive capability.
11
What is Cross-Validation?
Divide the data into two sub-samples: Treatment set (to fit the model), Validation set (to assess predictive value). Non-parametric approach: mainly used when normality assumption is not met. Criterion for model’s prediction ability: usually the MSEP statistic.
12
CV For Linear Regression: The “Withhold-1” Algorithm
Use the model: y = b0 + b1 x1 + b2 x2 + e .
13
CV For Linear Regression: The “Withhold-1” Algorithm
Use the model: y = b0 + b1 x1 + b2 x2 + e . Withhold one observation (x1i, x2i, yi).
14
CV For Linear Regression: The “Withhold-1” Algorithm
Use the model: y = b0 + b1 x1 + b2 x2 + e . Withhold one observation (x1i, x2i, yi). Fit the regression model to the remaining n – 1 observations.
15
CV For Linear Regression: The “Withhold-1” Algorithm
Use the model: y = b0 + b1 x1 + b2 x2 + e . Withhold one observation (x1i, x2i, yi). Fit the regression model to the remaining n – 1 observations. For each i, calculate
16
CV For Linear Regression: The “Withhold-1” Algorithm
Use the model: y = b0 + b1 x1 + b2 x2 + e . Withhold one observation (x1i, x2i, yi). Fit the regression model to the remaining n – 1 observations. For each i, calculate Finally, calculate
17
What is the Bootstrap? The Bootstrap is:
A computationally intensive technique, Involves simulation and resampling. Used here to assess the accuracy of statistical estimates for a model: Confidence intervals, Standard errors, Estimate of MSEP.
18
Algorithm For a Bootstrap
19
Algorithm For a Bootstrap
From a data set of size n, randomly draw B samples with replacement, each of size n.
20
Algorithm For a Bootstrap
From a data set of size n, randomly draw B samples with replacement, each of size n. Find the estimate of MSEP for each of the B samples:
21
Algorithm For a Bootstrap
From a data set of size n, randomly draw B samples with replacement, each of size n. Find the estimate of MSEP for each of the B samples: Average these B estimates of q to obtain the overall bootstrap estimate:
22
Schematic Diagram of Bootstrap
Θ(x2*) Θ(x1*) Θ(xB*) Bootstrap Samples X1* X2* XB* Resampling Variablity Data X=(x1, x2, …,xn) Sampling Variablity Population F
23
Application: Heart Measurements on Children
Study: Catheterize 12 children with heart defects and take measurements. Variables measured: y: observed catheter length in cm w: patient’s weight in pounds h: patient’s height in inches Goal: To predict y from w and h. Difficulties: Small n, non-normal data.
24
Model and Fitted Model
25
Model and Fitted Model Model: y = b0 + b1w + b2h + e . Fitted Model:
26
Model and Fitted Model Model: y = b0 + b1w + b2h + e . Fitted Model:
Parameter estimates for the heart data are: b0 estimated as 25.6, b1 estimated as 0.277, b2 term eliminated from model (not useful).
27
Regression Results Both parameters b0 and b1 are significantly different from 0 (important to the model): p-values: (for b0) and (for b1) R2 = 80% (of variation in y explained) Once weight is known, height does not provide additional useful information. Example: For a child weighing 50 lbs., the estimated distance is cm.
28
Comparison of CV and Bootstrap
MSEP Estimates: CV: MSEP = 18.05 Bootstrap: MSEP = (smaller = better) For this example: The Bootstrap has the better prediction capability. In general: CV methods work well for large samples. Bootstrap is effective, even for small samples.
29
Cross-Validation vs. Bootstrap Estimates of Prediction in Statistical Modeling
Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess Assistant Professor of Statistics:
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.