Bootstrapping LING 572 Fei Xia 1/31/06.

Slides:



Advertisements
Similar presentations
Chap 8-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 8 Estimation: Single Population Statistics for Business and Economics.
Advertisements

Sampling Distributions (§ )
Today: Quizz 11: review. Last quizz! Wednesday: Guest lecture – Multivariate Analysis Friday: last lecture: review – Bring questions DEC 8 – 9am FINAL.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Resampling techniques
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Chapter 8 Estimation: Single Population
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Fall 2006 – Fundamentals of Business Statistics 1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 7 Estimating Population Values.
Chapter 7 Estimation: Single Population
AP Statistics Section 10.2 A CI for Population Mean When is Unknown.
Chapter 11: Inference for Distributions
STAT 572: Bootstrap Project Group Members: Cindy Bothwell Erik Barry Erhardt Nina Greenberg Casey Richardson Zachary Taylor.
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
Chapter 7 Estimation: Single Population
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Confidence Interval Estimation.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, June 2011 Paul Kershaw University.
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Estimation PowerPoint Prepared by Alfred P. Rovai.
Biostatistics IV An introduction to bootstrap. 2 Getting something from nothing? In Rudolph Erich Raspe's tale, Baron Munchausen had, in one of his many.
PARAMETRIC STATISTICAL INFERENCE
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.2.
Slide 1 © 2002 McGraw-Hill Australia, PPTs t/a Introductory Mathematics & Statistics for Business 4e by John S. Croucher 1 n Learning Objectives –Identify.
Active Learning Lecture Slides For use with Classroom Response Systems Statistical Inference: Confidence Intervals.
CHAPTER 9 Estimation from Sample Data
Resampling techniques
Sampling And Resampling Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2007.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Estimation PowerPoint Prepared by Alfred P. Rovai.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Bootstraps and Jackknives Hal Whitehead BIOL4062/5062.
Section 6-3 Estimating a Population Mean: σ Known.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Chapter 5 Sampling Distributions. The Concept of Sampling Distributions Parameter – numerical descriptive measure of a population. It is usually unknown.
Psychology 202a Advanced Psychological Statistics October 1, 2015.
Introduction to Inference Sampling Distributions.
1 Probability and Statistics Confidence Intervals.
ESTIMATION OF THE MEAN. 2 INTRO :: ESTIMATION Definition The assignment of plausible value(s) to a population parameter based on a value of a sample statistic.
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
Chapter 9 Sampling Distributions 9.1 Sampling Distributions.
Design and Data Analysis in Psychology I English group (A) Salvador Chacón Moscoso Susana Sanduvete Chaves Milagrosa Sánchez Martín School of Psychology.
Statistics for Business and Economics 7 th Edition Chapter 7 Estimation: Single Population Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
CHAPTER 10 Comparing Two Populations or Groups
Introduction to Inference
Estimation and Confidence Intervals
Introduction For inference on the difference between the means of two populations, we need samples from both populations. The basic assumptions.
Confidence Intervals and Sample Size
Application of the Bootstrap Estimating a Population Mean
Confidence Interval Estimation
Inference: Conclusion with Confidence
CHAPTER 10 Comparing Two Populations or Groups
Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03
CONCEPTS OF ESTIMATION
Chapter 25: Paired Samples and Blocks
CHAPTER 22: Inference about a Population Proportion
Sampling Distribution
Sampling Distribution
Chapter 6 Confidence Intervals.
Ch13 Empirical Methods.
Section 7.7 Introduction to Inference
CHAPTER 10 Comparing Two Populations or Groups
Bootstrapping Jackknifing
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 2: Basic Summary Statistics
CHAPTER 10 Comparing Two Populations or Groups
Techniques for the Computing-Capable Statistician
Bootstrapping and Bootstrapping Regression Models
Presentation transcript:

Bootstrapping LING 572 Fei Xia 1/31/06

Outline Basic concepts Case study

Motivation What’s the average price of house prices? From F, get a sample x=(x1, x2, …, xn), and calculate the average u. Question: how reliable is u? What’s the standard error of u? what’s the confidence interval?

Solutions One possibility: get several samples from F. Problem: it is impossible (or too expensive) to get multiple samples. Solution: bootstrapping

Procedure for bootstrapping Let the original sample be x=(x1,x2,…,xn) Repeat B times Generate a sample x* of size n from x by sampling with replacement. Compute for x*.  Now we end up with bootstrap values Use these values for calculating alll the quantities of interest (e.g., standard deviation, confidence intervals)

An example X1=(1.57,0.22,19.67, 0,0,2.2,3.12) Mean=4.13 X=(3.12, 0, 1.57, 19.67, 0.22, 2.20) Mean=4.46 X2=(0, 2.20, 2.20, 2.20, 19.67, 1.57) Mean=4.64 X3=(0.22, 3.12,1.57, 3.12, 2.20, 0.22) Mean=1.74

A quick view of bootstrapping Introduced by Bradley Efron in 1979 Named from the phrase “to pull oneself up by one’s bootstraps”, which is widely believed to come from “the Adventures of Baron Munchausen”. Popularized in 1980s due to the introduction of computers in statistical practice. It has a strong mathematical background. While it is a method for improving estimators, it is well known as a method for estimating standard errors, bias, and constructing confidence intervals for parameters.

A quick view of bootstrap (cont) It has minimum assumptions. It is merely based on the assumption that the sample is a good representation of the unknown population. In practice, it is computationally demanding, but the progress on computer speed makes it easily available in everyday practice.

The population  population distribution (unknown) Original sample  sampling distribution ? Resamples  bootstrap distribution

Bootstrap distribution The bootstrap does not replace or add to the original data. We use bootstrap distribution as a way to estimate the variation in a statistic based on the original data.

Bootstrap distributions usually approximate the shape, spread, and bias of the actual sampling distribution. Bootstrap distributions are centered at the value of the statistic from the original data plus any bias, while the sampling distribution is centered at the value of the parameter in the population, plus any bias.

Cases where bootstrap does not apply Small data sets: the original sample is not a good approximation of the population Dirty data: outliers add variability in our estimates. Dependence structures (e.g., time series, spatial problems): Bootstrap is based on the assumption of independence.

How many bootstrap samples are needed? Choice of B depends on Computer availability Type of the problem: standard errors, confidence intervals, … Complexity of the problem

Further reading SPlus: http://elms03.e-academy.com/splus

Case study

Additional slides

Resampling methods Boostrap Permutation tests Jackknife: we ignore one observation at each time …

Why resampling? Fewer assumptions Ex: resampling methods do not require that distributions be Normal or that sample sizes be large Greater accuracy: Permutation tests and come bootstrap methods are more accurate in practice than classical methods Generality: Resampling methods are remarkably similar for a wide range of statistics and do not require new formulas for every statististic. Promote understanding: Boostrap procedures build intuition by providing concrete analogies to theoretical concepts.