Resampling techniques

Slides:



Advertisements
Similar presentations
Review bootstrap and permutation
Advertisements

Hypothesis testing and confidence intervals by resampling by J. Kárász.
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Estimating the reliability of a tree Reconstructed phylogenetic trees are almost certainly wrong. They are estimates of the true tree. But how reliable.
Sampling Distributions (§ )
Today: Quizz 11: review. Last quizz! Wednesday: Guest lecture – Multivariate Analysis Friday: last lecture: review – Bring questions DEC 8 – 9am FINAL.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Resampling techniques
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Chapter Topics Confidence Interval Estimation for the Mean (s Known)
Bootstrapping LING 572 Fei Xia 1/31/06.
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
CONFIDENCE INTERVALS What is the Purpose of a Confidence Interval?
15-1 Introduction Most of the hypothesis-testing and confidence interval procedures discussed in previous chapters are based on the assumption that.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 14: Non-parametric tests Marshall University Genomics.
Standard error of estimate & Confidence interval.
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
1 Assessment of Imprecise Reliability Using Efficient Probabilistic Reanalysis Farizal Efstratios Nikolaidis SAE 2007 World Congress.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Statistical Computing
1 CSI5388 Error Estimation: Re-Sampling Approaches.
Empirical Research Methods in Computer Science Lecture 2, Part 1 October 19, 2005 Noah Smith.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Confidence Interval Estimation.
1 Chapter 6. Section 6-1 and 6-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, June 2011 Paul Kershaw University.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 03/10/2015 6:40 PM Final project: submission Wed Dec 15 th,2004.
Bootstrapping (And other statistical trickery). Reminder Of What We Do In Statistics Null Hypothesis Statistical Test Logic – Assume that the “no effect”
PARAMETRIC STATISTICAL INFERENCE
Using Resampling Techniques to Measure the Effectiveness of Providers in Workers’ Compensation Insurance David Speights Senior Research Statistician HNC.
Examples of Computing Uses for Statisticians Data management : data entry, data extraction, data cleaning, data storage, data manipulation, data distribution.
Performance of Resampling Variance Estimation Techniques with Imputed Survey data.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Computational statistics, lecture3 Resampling and the bootstrap  Generating random processes  The bootstrap  Some examples of bootstrap techniques.
Bootstraps and Jackknives Hal Whitehead BIOL4062/5062.
1 Chapter 6. Section 6-1 and 6-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Lynn Lethbridge SHRUG November, What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.
Timothy Aman, FCAS MAAA Managing Director, Guy Carpenter Miami Statistical Limitations of Catastrophe Models CAS Limited Attendance Seminar New York, NY.
Case Selection and Resampling Lucila Ohno-Machado HST951.
From Wikipedia: “Parametric statistics is a branch of statistics that assumes (that) data come from a type of probability distribution and makes inferences.
1 Probability and Statistics Confidence Intervals.
Non-parametric Approaches The Bootstrap. Non-parametric? Non-parametric or distribution-free tests have more lax and/or different assumptions Properties:
Project Plan Task 8 and VERSUS2 Installation problems Anatoly Myravyev and Anastasia Bundel, Hydrometcenter of Russia March 2010.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Quantifying Uncertainty
ESTIMATION OF THE MEAN. 2 INTRO :: ESTIMATION Definition The assignment of plausible value(s) to a population parameter based on a value of a sample statistic.
Bootstrapping James G. Anderson, Ph.D. Purdue University.
Bootstrapping and Randomization Techniques Q560: Experimental Methods in Cognitive Science Lecture 15.
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
Inference: Conclusion with Confidence
Standard Errors Beside reporting a value of a point estimate we should consider some indication of its precision. For this we usually quote standard error.
Introduction to Statistics: Probability and Types of Analysis
Inference: Conclusion with Confidence
R Data Manipulation Bootstrapping
Estimates of Bias & The Jackknife
Test for Mean of a Non-Normal Population – small n
Simulation: Sensitivity, Bootstrap, and Power
Quantifying uncertainty using the bootstrap
Bootstrap - Example Suppose we have an estimator of a parameter and we want to express its accuracy by its standard error but its sampling distribution.
Sampling Distribution
Sampling Distribution
Statistics Statistics are like bikinis.  What they reveal is suggestive, but what they conceal is vital.  ~Aaron Levenstein.
QQ Plot Quantile to Quantile Plot Quantile: QQ Plot:
Ch13 Empirical Methods.
Bootstrapping Jackknifing
Cross-validation Brenda Thomson/ Peter Fox Data Analytics
Sampling Distributions (§ )
Bootstrap and randomization methods
Techniques for the Computing-Capable Statistician
Bootstrapping and Bootstrapping Regression Models
Presentation transcript:

Introduction to resampling techniques for generating confidence measures

Resampling techniques Randomization Resampling without replacement (re-ordering, permutations) Jackknife Leaving one data point out at a time (not good for small sample sizes), in paleobiology usually used for phylogenetic analyses Sampling Standardization When comparing samples of different sizes Bootstrap Parametric Generate datasets from a parametrized model and comparing these with empirical data Non parametric Most common in paleobiology

Randomization Empirical Data Randomized Sample 1 Randomized Sample 2 …. Randomized Sample N

Jack-Knife Empirical Data Jack knife sample 3 Jack knife sample 1 …..Jack knife sample N

Sampling Standardization Empirical data 1 Empirical data 2 Empirical data 3 Standardized Sample 1 Standardized Sample 2 … Standardized Sample N

Non-parametric bootstrap Empirical Data Non-parametric bootstrap Bootstrapped Sample 3 Bootstrapped Sample 1 Bootstrapped Sample 2 ….. Bootstrapped Sample N

Non-parametric bootstrap Parametric bootstrap Empirical data Bootstraps samples Empirical data Simulated samples Estimate parameters (model) Estimate parameters Estimate parameters (model) Estimate parameters

Resampling techniques Randomization Resampling without replacement (re-ordering, permutations) Jackknife Leaving one data point out at a time (not good for small sample sizes), in paleobiology usually used for phylogenetic analyses Sampling Standardization When comparing samples of different sizes Bootstrap Parametric Generate datasets from a parametrized model and comparing these with empirical data Non parametric Most common in paleobiology

Why resampling (now) Underlying distribution of data not well understood and/or complex Convenient way to generate uncertainty measures Computer intensive (possible only with faster computers)

Bootstrapping construct estimate of frequency distributions expected from a “generative process” Equivalent to generating replicate outcomes from an experiment (doing something many times to see the range of results) Assumption: data are representative sample of independent observations derived randomly from the studied statistical population

Bootstrap error estimates Estimate standard error by resampling from the single sample we have. This approach uses sampling with replacement from observed sample to simulate sampling without replacement from the underlying distribution. Procedure Start with observed sample of size n and observed sample statistic, call it Z. Randomly pick a sample of size n, with replacement, from the observedsample. Calculate the sample statistic of interest on this random sample; call isZboot. Repeat many times (generally hundreds to thousands, ideally untilestimate of SE stabilizes). Calculate standard deviation of the Zboot. This is an estimate of the standard error of the observed sample statistic Z:SD(Zboot) ≈ SE(Z)

Example (sampling standardization) Alroy et al. 2008. Phanerozoic trends in the global diversity of marine invertebrates. Science 321:97-100

Example (non parametric bootstrap) Foote, M. 2006. Substrate affinity and diversity dynamics of Paleozoic marine animals Paleobiology 32:345-366.

Example (non parametric bootstrap) Liow et al- 2009. Lower extinction risk in Sleep-or-Hide Mammals. Am Nat 173:264–272.

R demo Packages (e.g. boot, boostrap) Write your own: use the function sample Nice help http://www.ats.ucla.edu/stat/r/library/bootstrap.htm

Links http://www.paleo.geos.vt.edu/MK/Kowalewski_PNG_2010.pdf http://www.stat.cmu.edu/~cshalizi/402/lectures/08-bootstrap/lecture-08.pdf