Biostatistics IV An introduction to bootstrap. 2 Getting something from nothing? In Rudolph Erich Raspe's tale, Baron Munchausen had, in one of his many.

Slides:



Advertisements
Similar presentations
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Advertisements

Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Sampling: Final and Initial Sample Size Determination
Sampling Distributions (§ )
Chapter 11- Confidence Intervals for Univariate Data Math 22 Introductory Statistics.
Today: Quizz 11: review. Last quizz! Wednesday: Guest lecture – Multivariate Analysis Friday: last lecture: review – Bring questions DEC 8 – 9am FINAL.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7.3 Estimating a Population mean µ (σ known) Objective Find the confidence.
Ch 6 Introduction to Formal Statistical Inference.
Estimation A major purpose of statistics is to estimate some characteristics of a population. Take a sample from the population under study and Compute.
1 Introduction to Estimation Chapter Introduction Statistical inference is the process by which we acquire information about populations from.
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Chapter 7: Variation in repeated samples – Sampling distributions
HIM 3200 Normal Distribution Biostatistics Dr. Burton.
Bootstrapping LING 572 Fei Xia 1/31/06.
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
AP Statistics Section 10.2 A CI for Population Mean When is Unknown.
8-1 Introduction In the previous chapter we illustrated how a parameter can be estimated from sample data. However, it is important to understand how.
Stat 301 – Day 37 Bootstrapping, cont (5.5). Last Time - Bootstrapping A simulation tool for exploring the sampling distribution of a statistic, using.
The Sampling Distribution Introduction to Hypothesis Testing and Interval Estimation.
Standard error of estimate & Confidence interval.
Bootstrapping applied to t-tests
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
Confidence Intervals. Estimating the difference due to error that we can expect between sample statistics and the population parameter.
1 1 Slide Statistical Inference n We have used probability to model the uncertainty observed in real life situations. n We can also the tools of probability.
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal
Statistics: Concepts and Controversies What Is a Confidence Interval?
1 Introduction to Estimation Chapter Concepts of Estimation The objective of estimation is to determine the value of a population parameter on the.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, June 2011 Paul Kershaw University.
Bootstrapping (And other statistical trickery). Reminder Of What We Do In Statistics Null Hypothesis Statistical Test Logic – Assume that the “no effect”
PARAMETRIC STATISTICAL INFERENCE
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
Ch 6 Introduction to Formal Statistical Inference
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Estimation Chapter 8. Estimating µ When σ Is Known.
Chapter 8 Confidence Intervals 8.1 Confidence Intervals about a Population Mean,  Known.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
Chapter 10: Confidence Intervals
Psychology 202a Advanced Psychological Statistics October 1, 2015.
1 Chapter 8 Interval Estimation. 2 Chapter Outline  Population Mean: Known  Population Mean: Unknown  Population Proportion.
10.1 – Estimating with Confidence. Recall: The Law of Large Numbers says the sample mean from a large SRS will be close to the unknown population mean.
Lab Chapter 9: Confidence Interval E370 Spring 2013.
Project Plan Task 8 and VERSUS2 Installation problems Anatoly Myravyev and Anastasia Bundel, Hydrometcenter of Russia March 2010.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
1 Estimation Chapter Introduction Statistical inference is the process by which we acquire information about populations from samples. There are.
Estimating standard error using bootstrap
Inference: Conclusion with Confidence
CHAPTER 8 Estimating with Confidence
Application of the Bootstrap Estimating a Population Mean
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
Inference: Conclusion with Confidence
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
When we free ourselves of desire,
Estimating
Neuroinformatics 1.1: the bootstrap
Introduction to Inference
Econ 3790: Business and Economics Statistics
CHAPTER 22: Inference about a Population Proportion
BOOTSTRAPPING: LEARNING FROM THE SAMPLE
Ch13 Empirical Methods.
Sampling Distributions (§ )
Techniques for the Computing-Capable Statistician
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

Biostatistics IV An introduction to bootstrap

2 Getting something from nothing? In Rudolph Erich Raspe's tale, Baron Munchausen had, in one of his many adventurous travels, fallen to the bottom of a deep lake and just as he was to succumb to his fate he thought to pull himself up by his own BOOTSTRAP. The original version of the tale is in German, where Munchausen actually draws himself up by the hair, not the bootstraps. The figure on the left refers to German version of the story. Efron and LePage gave the method they developed the name Bootstrap in honour of the unbelievable stories that the Baron told of his travels.

3 The general problem “We have a set of real-valued observations x 1,... x n independently sampled from an unknown probability distribution F. We are interested in estimating some parameter  by using the information in the sample data with an estimator  -hat = t(x). Some measure of the estimate’s accuracy is as important as the estimate itself; we want a standard error of  -hat and, even better a confidence interval on the true value .” Efron and LePage (1992)

4 Our sample mean From the 4 samples we take a mean and calculate a standard deviation! By doing this we are assuming that the population is normally distributed But what if we don’t know the distribution? 4 sample 

5 The solution: Bootstrap Generate large numbers of “bootstrap” data sets, x1, x2, x3,... xb, from the original data. The observations in each data set is generated by a random draw, with replacement, from the observations in the original data set. Each data set has the same number of observation (n) as the original data set. Refit the model to each bootstrap data set. Compute the statistics of interest (probability profile, standard deviations, confidence intervals) from the results for each model fit.

6 Resampling the sample Within a bootstrap sample the values are drawn randomly from the original sample. Could thus get some measurements more than once within a particular bootstrap sample

7 Bootstrap: Why? When the sample contains all the available information about the population one can act as if the sample is really the population for the purpose of estimating the sampling distribution. Sampling with replacement is consistent with sampling a population that is effectively infinite - treat the sample as the total population. Elegant, powerful and easy :-)

8 Bootstrap: When? When sample cannot be represented by a distribution, especially if the underlying population distribution is not known. If one knows the distribution there is little advantage to using bootstrap. There is however no harm in bootstrapping such data sets!

Bootstrapping the residuals

10 Resampling residuals In a time series model one must maintain the time order of the data. Thus, resample the residuals with replacement from the optimum fit. The randomly sampled residuals are applied to the optimum fitted values to generate new bootstrap samples. Process repeated n-times to obtain a probability profile of the value of interest. Use a catch curve analysis as an illustration 

11 Original data set and statistics Data set: of one cohort Analysis: Slope estimate of fully recruited fish Z = -Slope = Total mortality Task: Obtain some information about the confidence interval of the Z using bootstrapping techniques.

12 The residuals to resample

13 One bootstrap sample Bootstrap value = Predicted value + random residual value

14 More formally stated … The * represents a random draw of the residuals from the set available

15 Original and 1 bootstrap sample

16 n bootstrap samples ….

bootstrap samples: Distribution

18 Bootstrap confidence intervals With b bootstrap estimates of the parameter of interest  b : Obtain confidence interval simply by finding the percentile bootstrap estimates that contain the desired confidence. More formally stated: An estimate of the 100(1-  )% CI around the sample estimate of is obtained from the two bootstrap estimates that contain the central 100(1-  )% of all b bootstrap estimates.

19 80% & 95% confidence interval

20 How many bootstraps? This was more of an issue prior to the common availability of powerful computers. However, in complicated models with many parameters issue is still valid and the number of bootstraps needed is a question of efficiency. Can simply test the sensitivity of the parameters in question by running different number of runs >

21 95% CI of Z and bootstrap numbers In this simple case do not need much more than around 200 bootstraps to get CI

22 Different CV in catches: 1000 bootstraps CV=0.1CV=0.2 CV=0.3 CV=0.4 True Z=0.8

23 Final point Bootstrap seems like the magic thing and it looks very impressive. It is a helpful explorative tool It represent the noise in the data given the model assumption. Thus not a true confidence profile of the total error. Should thus talk of “pseudo-confidence intervals” or “bootstrap confidence interval”. The same rule applies with using bootstrap as with any other tools: Understand how it is implemented in the particular software package that you may be using and ask if that is appropriate to the data set that you have.