Presentation is loading. Please wait.

Presentation is loading. Please wait.

August 2004Copyright Tim Hesterberg1 Introduction to the Bootstrap (and Permutation Tests) Tim Hesterberg, Ph.D. Association of General Clinical Research.

Similar presentations


Presentation on theme: "August 2004Copyright Tim Hesterberg1 Introduction to the Bootstrap (and Permutation Tests) Tim Hesterberg, Ph.D. Association of General Clinical Research."— Presentation transcript:

1 August 2004Copyright Tim Hesterberg1 Introduction to the Bootstrap (and Permutation Tests) Tim Hesterberg, Ph.D. Association of General Clinical Research Center Statisticians August 2004, Toronto

2 August 2004Copyright Tim Hesterberg2 Outline of Talk Why Resample? Introduction to Bootstrapping More examples, sampling methods Two-sample Bootstrap Two-sample Permutation Test Other statistics Other permutation tests

3 August 2004Copyright Tim Hesterberg3 Why Resample? Fewer assumptions: normality, equal variances Greater accuracy (in practice) Generality: Same basic procedure for wide range of statistics, sampling methods Promote understanding: Concrete analogies to theoretical concepts

4 August 2004Copyright Tim Hesterberg4 Good Books Hesterberg et al. Bootstrap Methods and Permutation Tests (2003, W. H. Freeman) B. Efron and R. Tibshirani An Introduction to the Bootstrap (1993, Chapman & Hall). A.C. Davison and D.V. Hinkley, Bootstrap Methods and Their Application (Cambridge University Press, 1997).

5 August 2004Copyright Tim Hesterberg5 Example - Verizon Number of Observations Average Repair Time ILEC (Verizon)1664 8.4 CLEC (other carrier) 2316.5 Is the difference statistically significant?

6 Example Data

7 August 2004Copyright Tim Hesterberg7 Start Simple We’ll start simple – single sample mean Later –other statistics –two samples –permutation tests

8 August 2004Copyright Tim Hesterberg8 Bootstrap Procedure Repeat 1000 times –Draw a sample of size n with replacement from the original data (“bootstrap sample”, or “resample”) –Calculate the sample mean for the resample The 1000 bootstrap sample means comprise the bootstrap distribution.

9 August 2004Copyright Tim Hesterberg9 Bootstrap Distn for ILEC mean

10 August 2004Copyright Tim Hesterberg10 Bootstrap Standard Error Bootstrap standard error (SE) = standard deviation of bootstrap distribution > ILEC.boot.mean Call: bootstrap(data = ILEC, statistic = mean, seed = 36) Number of Replications: 1000 Summary Statistics: Observed Mean Bias SE mean 8.412 8.395 -0.01698 0.3672

11 August 2004Copyright Tim Hesterberg11 Bootstrap Distn for CLEC mean

12 August 2004Copyright Tim Hesterberg12 Take another look Take another look at the previous two figures. Is the amount of non-normality/asymmetry there a cause for concern? Note – we’re looking at a sampling distribution, not the underlying distribution. This is after the CLT effect!

13 August 2004Copyright Tim Hesterberg13 Idea behind bootstrapping Plug-in principle –Underlying distribution is unknown –Substitute your best guess

14 August 2004Copyright Tim Hesterberg14 Ideal world

15 August 2004Copyright Tim Hesterberg15 Bootstrap world

16 August 2004Copyright Tim Hesterberg16 Fundamental Bootstrap Principle Plug-in principle –Underlying distribution is unknown –Substitute your best guess Fundamental Bootstrap Principle –This substitution works –Not always –Bootstrap distribution centered at statistic, not parameter

17 August 2004Copyright Tim Hesterberg17 Secondary Principle Implement the Fundamental Principle by Monte Carlo sampling This is just an implementation detail! –Exact: n n samples –Monte Carlo, typically 1000 samples 1000 realizations from theoretical bootstrap dist More for higher accuracy (e.g. 500,000)

18 August 2004Copyright Tim Hesterberg18 Not Creating Data from Nothing Some are uncomfortable with the bootstrap, because they think it is creating data out of nothing. (The name doesn’t help!) Not creating data. No better parameter estimates. (Exception – bagging, boosting.) Use the original data to estimate SE or other aspects of the sampling distribution. –Using sampling, rather than a formula

19 August 2004Copyright Tim Hesterberg19 Formulaic and Bootstrap SE

20 August 2004Copyright Tim Hesterberg20 What to substitute? Plug-in principle –Underlying distribution is unknown –Substitute your best guess What to substitute? –Empirical distribution – ordinary bootstrap –Smoothed distribution – smoothed bootstrap –Parametric distribution – parametric bootstrap –Satisfy assumptions, e.g. null hypothesis

21 August 2004Copyright Tim Hesterberg21 Another example: Kyphosis Variables Kyphosis (present or absent), Age of child, Number of vertebrae in operation, Start of range of vertebrae Logistic regression

22 August 2004Copyright Tim Hesterberg22 Kyphosis - Logistic Regression Value Std. Error t value (Intercept) -2.03693225 1.44918287 -1.405573 Age 0.01093048 0.00644419 1.696175 Start -0.20651000 0.06768504 -3.051043 Number 0.41060098 0.22478659 1.826626 Null Deviance: 83.23447 on 80 df Residual Deviance: 61.37993 on 77 df

23 August 2004Copyright Tim Hesterberg23 Kyphosis vs. Start

24 August 2004Copyright Tim Hesterberg24 Kyphosis Example Pseudo-code: Repeat 1000 times { Draw sample with replacement from original rows Fit logistic regression Save coefficients } Use the bootstrap distribution Live demo (kyphosis.ssc)

25 August 2004Copyright Tim Hesterberg25 Bootstrap SE and bias Bootstrap SE (standard error) = standard deviation of bootstrap distribution Bootstrap bias = mean of bootstrap distribution – original statistic

26 August 2004Copyright Tim Hesterberg26 t confidence interval Statistic +- t* SE(bootstrap) Reasonable interval if bootstrap distribution is approximately normal, little bias. Compare to bootstrap percentiles. Return to Kyphosis example In the literature, “bootstrap t” means something else.

27 August 2004Copyright Tim Hesterberg27 Percentiles to check Bootstrap t If bootstrap distribution is approximately normal and unbiased, then bootstrap t intervals and corresponding percentiles should be similar. Compare these If similar use either; else use a more accurate interval

28 August 2004Copyright Tim Hesterberg28 More Accurate Intervals BCa, Tilting, others (real bootstrap-t) Percentile and “bootstrap-t”: –first-order correct –Consistent, coverage error O(1/sqrt(n)) BCa and Tilting: –second-order correct –coverage error O(1/n)

29 August 2004Copyright Tim Hesterberg29 Different Sampling Procedures Two-sample applications Other sampling situations

30 August 2004Copyright Tim Hesterberg30 Two-sample Bootstrap Procedure Given independent SRSs from two populations: Repeat 1000 times –Draw sample size m from sample 1 –Draw sample size n from sample 2, independently –Compute statistic that compares two groups, e.g. difference in means The 1000 bootstrap statistics comprise the bootstrap distribution.

31 August 2004Copyright Tim Hesterberg31 Example – Relative Risk Blood PressureCardiovascular Disease High55/3338 = 0.0165 Low21/2676 = 0.0078 Estimated Relative Risk = 2.12

32 August 2004Copyright Tim Hesterberg32 …bootstrap Relative Risk

33 August 2004Copyright Tim Hesterberg33 Example: Verizon

34 August 2004Copyright Tim Hesterberg34 …difference in means

35 August 2004Copyright Tim Hesterberg35 …difference in trimmed means

36 August 2004Copyright Tim Hesterberg36 …comparison Diff means Observed Mean Bias SE mean -8.098 -7.931 0.1663 3.893 Diff 25% trimmed means Observed Mean Bias SE Param -10.34 -10.19 0.1452 2.737

37 August 2004Copyright Tim Hesterberg37 Other Sampling Situations Stratified Sampling –Resample within strata Small samples or strata –Correct for narrowness bias Finite Population –Create finite population, resample without replacement Regression

38 August 2004Copyright Tim Hesterberg38 Bootstrap SE too small Usual SE for mean is where Bootstrap corresponds to using divisor of n instead of n-1. Bias factor for each sample, each stratum

39 August 2004Copyright Tim Hesterberg39 Remedies for small SE Multiply SE by sqrt(n/(n-1) –Equal strata sizes only. No effect on CIs. Sample with reduced size, (n-1) Bootknife sampling –Omit random observation –Sample size n from remaining n-1 Smoothed bootstrap –Choose smoothing parameter to match variance –Continuous data only

40 August 2004Copyright Tim Hesterberg40 Smoothed bootstrap Kernel Density Estimate = Nonparametric bootstrap + random noise

41 August 2004Copyright Tim Hesterberg41 Finite Population Sample size n from population size N If N is multiple of n, –repeat each observation (N/n) times, –bootstrap sample without replacement If N is not a multiple of n, –Repeat each observation same # of times round N/n up, down

42 August 2004Copyright Tim Hesterberg42 Resampling for Regression Resample observations (random effects) –Problem with factors, random amount of info Resample residuals (fixed effects) –Fit model –Resample residuals, with replacement –Add to fitted values –Problems with heteroskedasticity, lack of fit

43 August 2004Copyright Tim Hesterberg43 Basic Rule for Sampling Sample in a way consistent with how the data were produced Including any additional information –Continuous distribution (if it matters, e.g. for medians) –Null hypothesis

44 August 2004Copyright Tim Hesterberg44 Resampling for Hypothesis Tests Sample in a manner consistent with H0 P-value = P 0 (random value exceeds observed value)

45 August 2004Copyright Tim Hesterberg45 Permutation Test for 2-samples H 0 : no real difference between groups; observations could come from one group as well as the other Resample: randomly choose n 1 observations for group 1, rest for group 2. Equivalent to permuting all n, first n 1 into group 1.

46 August 2004Copyright Tim Hesterberg46 Verizon permutation test

47 August 2004Copyright Tim Hesterberg47 Test results Pooled-variance t-test t = -2.6125, df = 1685, p-value = 0.0045 Non-pooled-variance t-test t = -1.9834, df = 22.3463548265907, p-value = 0.0299 > permVerizon3 Call: permutationTestMeans(data = Verizon$Time, treatment = Verizon$Group, B = 499999, alternative = "less", seed = 99) Number of Replications: 499999 Summary Statistics: Observed Mean SE alternative p.value Var -8.098 -0.001288 3.105 less 0.01825

48 August 2004Copyright Tim Hesterberg48 Permutation vs Pooled Bootstrap Pooled bootstrap test –Pool all n observations –Choose n 1 with replacement for group 1 –Choose n 2 with replacement for group 2 Permutation test is preferred –Condition on the observed data –Same number of outliers as the observed data

49 August 2004Copyright Tim Hesterberg49 Assumptions Permutation Test: –Same distribution for two populations When H 0 is true Population variances must be the same; sample variances may differ –Does not require normality –Does not require that data be a random sample from a larger population

50 August 2004Copyright Tim Hesterberg50 Other Statistics Procedure works for variety of statistics –Difference in means –t-statistic –difference in trimmed means Work directly with statistic of interest –Same p-value for and pooled-variance t- statistic

51 August 2004Copyright Tim Hesterberg51 Difference in Trimmed Means P-value = 0.0002

52 August 2004Copyright Tim Hesterberg52 General Permutation Tests Compute Statistic for data Resample in a way consistent with H 0 and study design Construct permutation distribution P-value = percentage of resampled statistics that exceed original statistic

53 August 2004Copyright Tim Hesterberg53 Perm Test for Matched Pairs or Stratified Sampling Permute within each pair Permute within each stratum

54 August 2004Copyright Tim Hesterberg54 Example: Puromycin The data are from a biochemical experiment where the initial velocity of a reaction was measured for different concentrations of the substrate. Data are from two runs, one on cells treated with the drug Puromycin, the other on cells without Variables concentration, velocity, treatment

55 August 2004Copyright Tim Hesterberg55 Puromycin data

56 August 2004Copyright Tim Hesterberg56 Permutation Test for Puromycin Statistic: ratio of smooths, at each original concentration Stratify by original concentration Permute only the treatment variable permutationTest(data = Puromycin, statistic = f, alternative = "less", combine = T, seed = 42, group = Puromycin$conc, resampleColumns = "state")

57 August 2004Copyright Tim Hesterberg57 Puromycin – Permutation Graphs

58 August 2004Copyright Tim Hesterberg58 Puromycin – P-values Summary Statistics: Observed Mean SE alternative p-value 0.02 0.9085 1.016 0.14932 less 0.259 0.06 0.8509 1.005 0.08191 less 0.024 0.11 0.8254 1.002 0.07011 less 0.003 0.22 0.8034 1.001 0.07657 less 0.002 0.56 0.7850 1.007 0.09675 less 0.002 1.1 0.7937 1.025 0.13384 less 0.053 Combined p-value: 0.02, 0.06, 0.11, 0.22, 0.56, 1.1 0.002

59 August 2004Copyright Tim Hesterberg59 Permutation test curves

60 August 2004Copyright Tim Hesterberg60 Permutation Test of Relationship To test H 0 : X and Y are independent Permute either X or Y (both is just extra work) Test statistic may be correlation, regression slope, chi-square statistic (Fisher’s exact test), …

61 August 2004Copyright Tim Hesterberg61 Perm Test in Regression Simple regression: permute X or Y Multiple regression: –Permute Y to test H0: no X contributes –To test incremental contribution of X 1 Cannot permute X 1 That loses joint relationship of Xs

62 August 2004Copyright Tim Hesterberg62 Example: Kyphosis Variables Kyphosis (present or absent), Age of child, Number of vertebrae in operation, Start of range of vertebrae Logistic regression

63 August 2004Copyright Tim Hesterberg63 Kyphosis - Logistic Regression Value Std. Error t value (Intercept) -2.03693225 1.44918287 -1.405573 Age 0.01093048 0.00644419 1.696175 Start -0.20651000 0.06768504 -3.051043 Number 0.41060098 0.22478659 1.826626 Null Deviance: 83.23447 on 80 df Residual Deviance: 61.37993 on 77 df

64 August 2004Copyright Tim Hesterberg64 Kyphosis vs. Start

65 August 2004Copyright Tim Hesterberg65 Kyphosis Permutation Test Permute Kyphosis (the response variable), leaving other variables fixed. Test statistic is residual deviance. Summary Statistics: Observed Mean SE alternative p-value Param 61.38 79.95 2.828 less 0.001

66 August 2004Copyright Tim Hesterberg66 Kyphosis Permutation Distribution

67 August 2004Copyright Tim Hesterberg67 When Perm Testing Fails Permutation Testing is not Universal –Cannot test H 0 :  = 0 –Cannot test H 0 :  = 1 Use Confidence Intervals Bootstrap tilting –Find maximum-likelihood weighted distribution that satisfies H 0, use weighted bootstrap

68 August 2004Copyright Tim Hesterberg68 If time permits Bias – Portfolio optimization example, in section3.ppt More about confidence intervals, from section5.ppt

69 August 2004Copyright Tim Hesterberg69 Summary Basic bootstrap idea – –Substitute best estimate for population(s) For testing, match null hypothesis –Sample consistently with how data produced –Inspect bootstrap distribution – Normal? –Compare t and percentile intervals, BCa & tilting

70 August 2004Copyright Tim Hesterberg70 Summary Testing –Sample consistent with H0 –Permutation test to compare groups, test relationships –No permutation tests in some situations; use bootstrap confidence interval or test

71 August 2004Copyright Tim Hesterberg71 Resources www.insightful.com/Hesterberg/bootstrap S+Resample www.insightful.com/downloads/libraries TimH@insightful.com

72 August 2004Copyright Tim Hesterberg72 Supplement for pages 24-27 This document is a supplement to the presentation at the AGS. This includes some material that was shown in a live demo using S-PLUS, corresponding to pages 24-27 of the original presentation.

73 August 2004Copyright Tim Hesterberg73 Another example: Kyphosis Variables Kyphosis (present or absent), Age of child, Number of vertebrae in operation, Start of range of vertebrae Logistic regression

74 August 2004Copyright Tim Hesterberg74 Kyphosis - Logistic Regression Value Std. Error t value (Intercept) -2.03693225 1.44918287 -1.405573 Age 0.01093048 0.00644419 1.696175 Start -0.20651000 0.06768504 -3.051043 Number 0.41060098 0.22478659 1.826626 Null Deviance: 83.23447 on 80 df Residual Deviance: 61.37993 on 77 df

75 August 2004Copyright Tim Hesterberg75 Kyphosis Example Pseudo-code: Repeat 1000 times { Draw sample with replacement from original rows Fit logistic regression Save coefficients } Use the bootstrap distribution Live demo (kyphosis.ssc)

76 August 2004Copyright Tim Hesterberg76 Kyphosis vs. Start

77 August 2004Copyright Tim Hesterberg77 Graphical bootstrap of predictions

78 August 2004Copyright Tim Hesterberg78 Bootstrap Coefficients

79 August 2004Copyright Tim Hesterberg79 Bootstrap Scatterplots

80 August 2004Copyright Tim Hesterberg80 t confidence interval Statistic +- t* SE(bootstrap) Reasonable interval if bootstrap distribution is approximately normal, little bias. Compare to bootstrap percentiles. Return to Kyphosis example In the literature, “bootstrap t” means something else.

81 August 2004Copyright Tim Hesterberg81 Are t-limits reasonable here?

82 August 2004Copyright Tim Hesterberg82 Are t-limits reasonable here?

83 August 2004Copyright Tim Hesterberg83 Are t-limits reasonable here? Remember, the previous two plots show the bootstrap distribution, an estimate of the sampling distribution, after the Central Limit Theorem has had its chance to work.

84 August 2004Copyright Tim Hesterberg84 Percentiles to check Bootstrap t If bootstrap distribution is approximately normal and unbiased, then bootstrap t intervals and corresponding percentiles should be similar. Compare these If similar use either; else use a more accurate interval

85 August 2004Copyright Tim Hesterberg85 Compare t and percentile CIs > signif(limits.t(boot.kyphosis), 2) 2.5% 5% 95% 97.5% (Intercept) -6.1000 -5.4000 1.400 2.000 Age -0.0054 -0.0027 0.025 0.027 Start -0.3800 -0.3500 -0.063 -0.034 Number -0.2900 -0.1800 1.000 1.100 > signif(limits.percentile(boot.kyphosis), 2) 2.5% 5% 95% 97.5% (Intercept) -6.80000 -5.8000 0.560 1.400 Age 0.00077 0.0021 0.028 0.033 Start -0.44000 -0.3900 -0.120 -0.095 Number -0.09400 0.0078 1.100 1.300

86 August 2004Copyright Tim Hesterberg86 Compare asymmetry of CIs > signif(limits.t(boot.kyphosis) - boot.kyphosis$observed, 2) 2.5% 5% 95% 97.5% (Intercept) -4.100 -3.400 3.400 4.100 Age -0.016 -0.014 0.014 0.016 Start -0.170 -0.140 0.140 0.170 Number -0.710 -0.590 0.590 0.710 > signif(limits.percentile(boot.kyphosis) - boot.kyphosis$observed, 2) 2.5% 5% 95% 97.5% (Intercept) -4.80 -3.8000 2.600 3.500 Age -0.01 -0.0088 0.018 0.022 Start -0.23 -0.1800 0.088 0.110 Number -0.51 -0.4000 0.710 0.850


Download ppt "August 2004Copyright Tim Hesterberg1 Introduction to the Bootstrap (and Permutation Tests) Tim Hesterberg, Ph.D. Association of General Clinical Research."

Similar presentations


Ads by Google