1 Stat 6601 Presentation Presented by: Xiao Li (Winnie) Wenlai Wang Ke Xu Nov. 17, 2004 V & R 6.6.

Slides:



Advertisements
Similar presentations
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Advertisements

BA 275 Quantitative Business Methods
Inference for Regression Today we will talk about the conditions necessary to make valid inference with regression We will also discuss the various types.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Review of Univariate Linear Regression BMTRY 726 3/4/14.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
1 Statistics & R, TiP, 2011/12 Linear Models & Smooth Regression  Linear models  Diagnostics  Robust regression  Bootstrapping linear models  Scatterplot.
Simple Linear Regression 1. Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable.
Multiple Regression Predicting a response with multiple explanatory variables.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Chapter 10 Simple Regression.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Regression Hal Varian 10 April What is regression? History Curve fitting v statistics Correlation and causation Statistical models Gauss-Markov.
Resampling techniques
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Final Review Session.
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
August 2004Copyright Tim Hesterberg1 Introduction to the Bootstrap (and Permutation Tests) Tim Hesterberg, Ph.D. Association of General Clinical Research.
Violations of Assumptions In Least Squares Regression.
Regression Diagnostics Checking Assumptions and Data.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Meta-Analysis and Meta- Regression Airport Noise and Home Values J.P. Nelson (2004). “Meta-Analysis of Airport Noise and Hedonic Property Values: Problems.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Model Building III – Remedial Measures KNNL – Chapter 11.
Guide to Handling Missing Information Contacting researchers Algebraic recalculations, conversions and approximations Imputation method (substituting missing.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data.
Regression Model Building LPGA Golf Performance
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Robust Regression V & R: Section 6.5 Denise Hum. Leila Saberi. Mi Lam.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Simple Linear Regression (SLR)
Mixed Effects Models Rebecca Atkins and Rachel Smith March 30, 2015.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Lecture 10: Correlation and Regression Model.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Introduction to Statistical Modelling Example: Body and heart weights of cats. The R data frame cats, and the variables therein, are made available by.
Linear Models Alan Lee Sample presentation for STATS 760.
Case Selection and Resampling Lucila Ohno-Machado HST951.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Maths Study Centre CB Open 11am – 5pm Semester Weekdays
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Chapter 12: Correlation and Linear Regression 1.
Canadian Bioinformatics Workshops
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Stats Methods at IC Lecture 3: Regression.
Chapter 20 Linear and Multiple Regression
Application of the Bootstrap Estimating a Population Mean
Bootstrapping Jackknifing
Presentation transcript:

1 Stat 6601 Presentation Presented by: Xiao Li (Winnie) Wenlai Wang Ke Xu Nov. 17, 2004 V & R 6.6

2 Preview of the Presentation 11/17/2004 Bootstrapping Linear Models  Introduction to Bootstrap  Data and Modeling  Methods on Bootstrapping LM  Results  Issues and Discussion  Summary

3 What is Bootstrapping ? 11/17/2004 Bootstrapping Linear Models  Invented by Bradley Efron, and further developed by Efron and Tibshirani  A method for estimating the sampling distribution of an estimator by resampling with replacement from the original sample  A method to determine the trustworthiness of a statistic (generalization of the standard deviation)

4 Why uses Bootstrapping ? 11/17/2004 Bootstrapping Linear Models  Start with 2 questions:  What estimator should be used?  Having chosen an estimator, how accurate is it?  Linear Model with normal random errors having constant variance  Least Square  Generalized non-normal errors and non-constant variance  ???

5 The Mammals Data 11/17/2004 Bootstrapping Linear Models  A data frame with average brain and body weights for 62 species of land mammals.  “body” :Body weight in Kg  “brain” :Brain weight in g  “name”:Common name of species

6 Data and Model 11/17/2004 Bootstrapping Linear Models Linear Regression Model: where j = 1, …, n, and is considered random y = log(brain weight) x = log(body weight)

7 Summary of Original Fit 11/17/2004 Bootstrapping Linear Models Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) <2e-16 *** log(body) <2e-16 *** Residual standard error: on 60 DF Multiple R-Squared: Adjusted R-squared: F-statistic: on 1 and 60 DF p-value: < 2.2e-16

8 for Original Modeling 11/17/2004 Bootstrapping Linear Models library(MASS) library(boot) c <- par(mfrow=c(1,2)) data <- data(mammals) plot(mammals$body, mammals$brain, main='Original Data', xlab='body weight', ylab='brain weight', col=’brown’) # plot of data plot(log(mammals$body), log(mammals$brain), main='Log-Transformed Data', xlab='log body weight', ylab='log brain weight', col=’brown’) # plot of log-transformed data mammal <- data.frame(log(mammals$body), log(mammals$brain)) dimnames(mammal) <- list((1:62), c("body", "brain")) attach(mammal) log.fit <- lm(brain~body, data=mammal) summary(log.fit)

9 Two Methods 11/17/2004 Bootstrapping Linear Models  Case-based Resampling: randomly sample pairs (Xi, Yi) with replacement  No assumption about variance homogeneity  Design fixes the information content of a sample  Model-based Resampling: resample the residuals  Assume model is correct with homoscedastic errors  Resampling model has the same “design” as the data

10 Case-Based Resample Algorithm 11/17/2004 Bootstrapping Linear Models For r = 1, …, R, 1.sample randomly with replacement from {1, 2, …,n} 2.for j = 1, …, n, set, then 3.fit least squares regression to, …, giving estimates,,.

11 Model-Based Resample Algorithm 11/17/2004 Bootstrapping Linear Models For r = 1, …, n, 1.For j = 1, …, n, a)Set b)Randomly sample from, …, ; then c)Set 1.Fit least squares regression to,…, giving estimates,,.

12 Case-Based Bootstrap 11/17/2004 Bootstrapping Linear Models ORDINARY NONPARAMETRIC BOOTSTRAP Bootstrap Statistics : original bias std. error t1* t2* BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Intervals : Level Normal Percentile BCa 95% ( 1.966, ) ( 1.963, ) ( 1.974, ) 95% ( , ) ( , ) ( , ) Calculations and Intervals on Original Scale

13 Case-Based Bootstrap 11/17/2004 Bootstrapping Linear Models Bootstrap Distribution Plots for intercept and Slope

14 Case-Based Bootstrap 11/17/2004 Bootstrapping Linear Models Standardized Jackknife-after-Bootstrap Plots for intercept and Slope

15 for Case-Based 11/17/2004 Bootstrapping Linear Models # Case-Based Resampling fit.case <- function(data) coef(lm(log(data$brain)~log(data$body))) mam.case <- function(data, i) fit.case(data[i, ]) mam.case.boot <- boot(mammals, mam.case, R = 999) mam.case.boot boot.ci(mam.case.boot, type=c("norm", "perc", "bca")) boot.ci(mam.case.boot, index=2, type=c("norm", "perc", "bca")) plot(mam.case.boot) plot(mam.case.boot, index=2) jack.after.boot(mam.case.boot) jack.after.boot(mam.case.boot, index=2)

16 Model-Based Bootstrap 11/17/2004 Bootstrapping Linear Models ORDINARY NONPARAMETRIC BOOTSTRAP Bootstrap Statistics : original bias std. error t1* t2* BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Intervals : Level Normal Percentile Bca 95% ( 1.945, ) ( 1.948, ) ( 1.941, ) 95% ( , ) ( , ) ( , ) Calculations and Intervals on Original Scale

17 Model-Based Bootstrap 11/17/2004 Bootstrapping Linear Models Bootstrap Distribution Plots for intercept and Slope

18 Model-Based Bootstrap 11/17/2004 Bootstrapping Linear Models Standardized Jackknife-after-Bootstrap Plots for intercept and Slope

19 for Model-Based 11/17/2004 Bootstrapping Linear Models # Model-Based Resampling (Resample Residuals) fit.res <- lm(brain ~ body, data=mammal) mam.res.data <- data.frame(mammal, res=resid(fit.res), fitted=fitted(fit.res)) mam.res <- function(data, i){ d <- data d$brain <- d$fitted + d$res[i] coef(update(fit.res, data=d)) } fit.res.boot <- boot(mam.res.data, mam.res, R = 999) fit.res.boot boot.ci(fit.res.boot, type=c("norm", "perc", "bca")) boot.ci(fit.res.boot, index=2, type=c("norm", "perc", "bca")) plot(fit.res.boot) plot(fit.res.boot, index=2) boot.ci(fit.res.boot, type=c("norm", "perc", "bca")) jack.after.boot(fit.res.boot) jack.after.boot(fit.res.boot, index=2)

20 Comparisons and Discussion 11/17/2004 Bootstrapping Linear Models Comparing Fields Original Model Case-Based (Fixed) Model-Bsed (Random) Intercept (t 1 *) Stand Error Slope (t 2 *) Stand Error

21 Case-Based Vs. Model-Based 11/17/2004 Bootstrapping Linear Models  Model-based resampling enforces the assumption that errors are randomly distributed by resampling the residuals from a common distribution  If the model is not specified correctly – i.e., unmodeled nonlinearity, non-constant error variance, or outliers – these attributes do not carry over to the bootstrap samples  The effects of outliers is clear in the case-based, but not with the model-based.

22 When Might Bootstrapping Fail? 11/17/2004 Bootstrapping Linear Models  Incomplete Data  Assume that missing data are not problematic  If multiple imputation is used beforehand  Dependent Data  Bootstrap imposes mutual dependence on the Y j, and thus their joint distribution is  Outliers and Influential Cases  Remove/Correct obvious outliers  Avoid the simulations to depend on particular observations

23 Review & More Resampling 11/17/2004 Bootstrapping Linear Models Resampling techniques are powerful tools for: -- estimating SD from small samples -- when the statistics do not have easily determined SD Bootstrapping involves: -- taking ‘new’ random samples with replacement from the original data -- calculate boostrap SD and statistical test from the average of the statistic from the bootstrap samples More resampling techniques: -- Jackknife resampling -- Cross-validation

24 SUMMARY 11/17/2004 Bootstrapping Linear Models  Introduction to Bootstrap  Data and Modeling  Methods on Bootstrapping LM  Results and Comparisons  Issues and Discussion

25 Reference 11/17/2004 Bootstrapping Linear Models  Anderson, B. “Resampling and Regression” McMaster University.  Davision, A.C. and Hinkley D.V. (1997) Bootstrap methods and their application. pp Cambridge University Press  Efron and Gong (February 1983), A Leisurely Look at the Bootstrap, the Jackknife, and Cross Validation, The American Statistician.  Holmes, S. “Introduction to the Bootstrap” Stanford University.  Venables and Ripley (2002), Modern Applied Statistics with S, 4 th ed. pp Springer

26 11/17/2004 Bootstrapping Linear Models

27 Extra Stuff… 11/17/2004 Bootstrapping Linear Models  Jackknife Resampling takes new samples of the data by omitting each case individually and recalculating the statistic each time  Resampling data by randomly taking a single observation out  # of jackknife samples used # of cases in the original sample  Works well for robust estimators of location, but not for SD  Cross-Validation randomly splits the sample into two groups comparing the model results from one sample to the results from the other.  1 st subset is used to estimate a statistical model (screening/training sample)  Then test our findings on the second subset. (confirmatory/test sample)