Presentation is loading. Please wait.

Presentation is loading. Please wait.

Limits to Statistical Theory Bootstrap analysis ESM 206 11 April 2006.

Similar presentations


Presentation on theme: "Limits to Statistical Theory Bootstrap analysis ESM 206 11 April 2006."— Presentation transcript:

1 Limits to Statistical Theory Bootstrap analysis ESM 206 11 April 2006

2 Assumption of t-test Sample mean is a t-distributed random variable –Guaranteed if observations are normally distributed random variables or sample size is very large –In practice, OK if observations are not too skewed and sample size is reasonably large This assumption also applies when using standard formula for 95% CI of mean

3 Resampling for a confidence interval of the mean IN AN IDEAL WORLD Take sample Calculate sample mean Take new sample Calculate new mean Repeat many times Look at the distribution of sample means 95% CI ranges from 2.5 percentile to 97.5 percentile IN THE REAL WORLD Find some way to simulate taking a sample Calculate the sample mean Repeat many times Look at the distribution of sample means 95% CI ranges from 2.5 percentile to 97.5 percentile

4 Bootstrap resampling PARAMETRIC BOOTSTRAP Assume data are random variables from a particular distribution –E.g., log-normal Use data to estimate parameters of the distribution –E.g., mean, variance Use random number generator to create sample –Same size as original –Calculate sample mean Allows us to ask: What if data were a random sample from specified distribution with specified parameters? NONPARAMETRIC BOOTSTRAP Assume underlying distribution from which data come is unknown –Best estimate of this distribution is the data themselves – the empirical distribution function Create a new dataset by sampling with replacement from the data –Same size as original –Calculate sample mean WHICH IS BETTER? If underlying distribution is correctly chosen, parametric has more precision If underlying distribution incorrectly chosen, parametric has more bias

5 TcCB in the cleanup site Parametric bootstrap –If Y is log-normal, it is specified in terms of mean and standard deviation of X = log(Y) –Mean = -0.547 –SD = 1.360 –Use “Monte Carlo Simulation” to generate 999 replicate simulated datasets from log-normal distribution –Calculate mean of each replicate and sort means –25 th value is lower end of 95% CI –975 th value is upper end of 95% CI 95% CI: [-0.678, 8.458]

6 Parametric bootstrap: results 95% CI: [0.917, 2.293]

7 Normal QQ Plot Sort data Index the values (i = 1,2,…,n) Calculate q = i /(n+1) –This is the quantile Plot quantiles against data values –This is the empirical cumulative distribution function (CDF) Construct CDF of standard normal using same quantiles Compare the distributions at the same quantiles

8 Nonparametric bootstrap: results 95% CI: [0.851, 9.248]

9 Bootstrap and hypothesis tests One sample t-test –Calculate bootstrap CI of mean –Does it overlap test value? Paired t-test –Calculate differences: D i = x i - y i –Find bootstrap CI of mean difference –Does it overlap zero? Two-sample t-test –Want to create simulated data where H0 is true (same mean) but allow variance and shape of distribution to differ between populations –Easiest with nonparametric: Subtract mean from each sample. Now both samples have mean zero Resample these residuals, creating simulated group A from residuals of group A and simulated group B from residuals of group B –Generate distribution of t values –P is fraction of simulated t’s that exceed t calculated from data

10 TcCB: H0: cleanup mean = reference mean t = 1.45 Bootstrapped ‘t’ values do not follow a t distribution! P = 0.02


Download ppt "Limits to Statistical Theory Bootstrap analysis ESM 206 11 April 2006."

Similar presentations


Ads by Google