Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess

Cross-Validation vs. Bootstrap Estimates of Prediction Error in Statistical Modeling
Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess Assistant Professor of Statistics:

Regression Analysis

Regression Analysis To find the regression line for data (xi, yi), minimize

Regression Analysis To find the regression line for data (xi, yi), minimize Estimates linear relationships between dependent and independent variables.

Regression Analysis To find the regression line for data (xi, yi), minimize Estimates linear relationships between dependent and independent variables. Applications: Prediction and Forecasting.

Classical Regression Procedure
Choose a model: y = b0 + b1 x1 + b2 x2 + e . Verify assumptions: normality of the data. Fit the model, checking for significance of parameters. Check the model’s predictive capability.

Mean Squared Error of Prediction

MSEP measures how well a model predicts the response value of a future observation.

MSEP measures how well a model predicts the response value of a future observation. For our regression model, the MSEP of a new observation yn + 1 is

MSEP measures how well a model predicts the response value of a future observation. For our regression model, the MSEP of a new observation yn + 1 is Small values of MSEP indicate good predictive capability.

What is Cross-Validation?
Divide the data into two sub-samples: Treatment set (to fit the model), Validation set (to assess predictive value). Non-parametric approach: mainly used when normality assumption is not met. Criterion for model’s prediction ability: usually the MSEP statistic.

CV For Linear Regression: The “Withhold-1” Algorithm
Use the model: y = b0 + b1 x1 + b2 x2 + e .

Use the model: y = b0 + b1 x1 + b2 x2 + e . Withhold one observation (x1i, x2i, yi).

Use the model: y = b0 + b1 x1 + b2 x2 + e . Withhold one observation (x1i, x2i, yi). Fit the regression model to the remaining n – 1 observations.

Use the model: y = b0 + b1 x1 + b2 x2 + e . Withhold one observation (x1i, x2i, yi). Fit the regression model to the remaining n – 1 observations. For each i, calculate

Use the model: y = b0 + b1 x1 + b2 x2 + e . Withhold one observation (x1i, x2i, yi). Fit the regression model to the remaining n – 1 observations. For each i, calculate Finally, calculate

What is the Bootstrap? The Bootstrap is:
A computationally intensive technique, Involves simulation and resampling. Used here to assess the accuracy of statistical estimates for a model: Confidence intervals, Standard errors, Estimate of MSEP.

Algorithm For a Bootstrap

From a data set of size n, randomly draw B samples with replacement, each of size n.

From a data set of size n, randomly draw B samples with replacement, each of size n. Find the estimate of MSEP for each of the B samples:

From a data set of size n, randomly draw B samples with replacement, each of size n. Find the estimate of MSEP for each of the B samples: Average these B estimates of q to obtain the overall bootstrap estimate:

Schematic Diagram of Bootstrap
Θ(x2*) Θ(x1*) Θ(xB*) Bootstrap Samples X1* X2* XB* Resampling Variablity Data X=(x1, x2, …,xn) Sampling Variablity Population F

Application: Heart Measurements on Children
Study: Catheterize 12 children with heart defects and take measurements. Variables measured: y: observed catheter length in cm w: patient’s weight in pounds h: patient’s height in inches Goal: To predict y from w and h. Difficulties: Small n, non-normal data.

Model and Fitted Model

Model and Fitted Model Model: y = b0 + b1w + b2h + e . Fitted Model:

Model and Fitted Model Model: y = b0 + b1w + b2h + e . Fitted Model:
Parameter estimates for the heart data are: b0 estimated as 25.6, b1 estimated as 0.277, b2 term eliminated from model (not useful).

Regression Results Both parameters b0 and b1 are significantly different from 0 (important to the model): p-values: (for b0) and (for b1) R2 = 80% (of variation in y explained) Once weight is known, height does not provide additional useful information. Example: For a child weighing 50 lbs., the estimated distance is cm.

Comparison of CV and Bootstrap
MSEP Estimates: CV: MSEP = 18.05 Bootstrap: MSEP = (smaller = better) For this example: The Bootstrap has the better prediction capability. In general: CV methods work well for large samples. Bootstrap is effective, even for small samples.

Cross-Validation vs. Bootstrap Estimates of Prediction in Statistical Modeling
Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess Assistant Professor of Statistics:

Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess

Similar presentations

Presentation on theme: "Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess

Similar presentations

Presentation on theme: "Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess"— Presentation transcript:

Similar presentations

About project

Feedback