Modern Approaches The Bootstrap with Inferential Example.

Slides:



Advertisements
Similar presentations
T HE ‘N ORMAL ’ D ISTRIBUTION. O BJECTIVES Review the Normal Distribution Properties of the Standard Normal Distribution Review the Central Limit Theorem.
Advertisements

T-tests continued.
Lecture 6 Outline – Thur. Jan. 29
Estimation in Sampling
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 9: Hypothesis Tests for Means: One Sample.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Inference about a Mean Part II
Chapter 11: Inference for Distributions
Copyright © 2010 Pearson Education, Inc. Chapter 24 Comparing Means.
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Today Concepts underlying inferential statistics
Getting Started with Hypothesis Testing The Single Sample.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 14: Non-parametric tests Marshall University Genomics.
Non-parametric statistics
Bootstrapping applied to t-tests
1 Psych 5500/6500 Statistics and Parameters Fall, 2008.
Nonparametric or Distribution-free Tests
Standard Error of the Mean
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
AM Recitation 2/10/11.
Education 793 Class Notes T-tests 29 October 2003.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 23, Slide 1 Chapter 23 Comparing Means.
More About Significance Tests
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Making decisions about distributions: Introduction to the Null Hypothesis 47:269: Research Methods I Dr. Leonard April 14, 2010.
Bootstrapping (And other statistical trickery). Reminder Of What We Do In Statistics Null Hypothesis Statistical Test Logic – Assume that the “no effect”
PARAMETRIC STATISTICAL INFERENCE
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 24 Comparing Means.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
The Robust Approach Dealing with real data. Review With regular analyses we have certain assumptions that are made, or requirements that have to be met.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
One-sample In the previous cases we had one sample and were comparing its mean to a hypothesized population mean However in many situations we will use.
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Inference about Two Means: Independent Samples 11.3.
Hypothesis Testing A procedure for determining which of two (or more) mutually exclusive statements is more likely true We classify hypothesis tests in.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
The Robust Approach Dealing with real data. Estimating Population Parameters Four properties are considered desirable in a population estimator:  Sufficiency.
Copyright © 2012 Pearson Education. All rights reserved © 2010 Pearson Education Copyright © 2012 Pearson Education. All rights reserved. Chapter.
Nonparametric Tests IPS Chapter 15 © 2009 W.H. Freeman and Company.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Robust Estimators.
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
Comparing Two Means Chapter 9. Experiments Simple experiments – One IV that’s categorical (two levels!) – One DV that’s interval/ratio/continuous – For.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
Comparing Means Chapter 24. Plot the Data The natural display for comparing two groups is boxplots of the data for the two groups, placed side-by-side.
Inference About Means Chapter 23. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it’d be nice.
Ex St 801 Statistical Methods Inference about a Single Population Mean (CI)
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Non-parametric Approaches The Bootstrap. Non-parametric? Non-parametric or distribution-free tests have more lax and/or different assumptions Properties:
Chapter 7: The Distribution of Sample Means
Inference about the mean of a population of measurements (  ) is based on the standardized value of the sample mean (Xbar). The standardization involves.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Chapter 9 Introduction to the t Statistic
Estimating standard error using bootstrap
Sampling distribution
Non-Parametric Tests 12/1.
Non-Parametric Tests 12/6.
Non-Parametric Tests.
Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of
Presentation transcript:

Modern Approaches The Bootstrap with Inferential Example

Non-parametric? Non-parametric or distribution-free tests have more lax and/or different assumptions Properties: – No assumption about the underlying distribution being normal – More sensitive to medians than means (which is good if you’re interested in the median) – Some may not be very affected by outliers Rank tests

Parametric vs. Non-parametric Parametric tests will typically be used when assumptions are met as they will usually have more power – Though the non-parametric tests might be able to match that power under certain conditions Non-parametric tests are often used with small samples or violations of assumptions

Common Nonparametric Techniques For purely categorical situations – Chi-square and Loglinear analysis Rank-based approaches to help with normality and outliers – Wilcoxon t for independent and dependent samples, Mann- Whitney U – Kruskal-Wallis, Friedman for more than 2 groups Transformations of data – Logarithmic, reciprocal etc.

More recent developments The Bootstrap The basic idea involves sampling with replacement from the sample data to produce random samples of size n – Each of these samples provides an estimate of the parameter of interest – Repeating the sampling a large number of times provides information on the variability of the estimate i.e. its standard error Necessary for any inferential test

TV Example How many hours of TV watched yesterday

Bootstrap 1000 samples Distribution of Means of each sample  Mean = 3.951

Inferential Use Bootstrapping applied to t-tests

Problems with t Wilcox notes that when we sample from a non-normal population, assuming normality of the sampling distribution maybe optimistic without large samples Furthermore, outliers have an influence on both the mean and sd used to calculate t – Actually has a larger effect on variance, increasing type II error due to std error increasing more so than the mean This is not to say we throw the t-distribution out the window – If we meet our assumptions and have ‘pretty’ data, it is appropriate However, if we cannot meet the normality assumption we may have to try a different approach – E.g. bootstrapping

More issues with the t-test In the two-sample case we have an additional assumption (along with normality and independent observations) We assume that there are equal variances in the groups – Recall our homoscedasticity discussion Often this assumption is untenable, and the results, like other violations result in using calculated probabilities that are inaccurate – Can use a correction, e.g. Welch’s t

More issues with the t-test It is one thing to say that they are unequal, but what might that mean? Consider a control and treatment group, treatment group variance is significantly greater While we can do a correction, the unequal variances may suggest that those in the treatment group vary widely in how they respond to the treatment Another reason for heterogeneity of variance may be related to an unreliable measure being used* No version of the t-test takes either into consideration Other techniques, assuming enough information has been gathered, may be more appropriate (e.g. hierarchical), and more reliable measures may be attainable

The good and the bad regarding t-tests The good If assumptions are met, t-test is fine When assumptions aren’t met, t-test may still be robust with regard to type I error in some situations – With equal n and normal populations HoV violations won’t increase type I much – With non-normal distributions with equal variances, type I error rate is maintained also The bad Even small departures from the assumptions result in power taking a noticeable hit (type II error is not maintained) t-statistic, CIs will be biased

Bootstrap Recall the notion of a sampling distribution We never have the population available in practice, so we take a sample (one of an infinite amount of possible ones) The sampling distribution is a theoretical distribution whose shape we assume

Bootstrap Here we will do as before where the basic idea involves sampling with replacement from the sample data (essentially treating it as the population) to produce random samples of size n – We create an empirical sampling distribution rather than assuming a theoretical one Each of these samples provides an estimate of the parameter of interest Repeating the sampling a large number of times provides information on the variability of the estimator

Bootstrap Hypothetical situation: – If we cannot assume normality, how would we go about getting a confidence interval? Wilcox suggests that assuming normality via the central limit theorem doesn’t hold for small samples, and sometimes could require as much as 200* to maintain type I error if the population is not normally distributed If we do not maintain type I error, confidence intervals and inferences based on them will be suspect – How might you get a confidence interval for something besides a mean? Solution: – Resample (with replacement) from our own data based on its distribution – Treat our sample as a population distribution and take random samples from it

The percentile bootstrap We will start by considering a mean We can bootstrap many sample means based on the original data One method would be to simply create this distribution of means, and note the percentiles associated with certain values

The percentile bootstrap Here are some values (from Wilcox text), mental health ratings of college students – Mean = 18.6 – Bootstrap mean (k =1000) = The bootstrapped 95% CI is – 13.85, Assuming normality – 13.39, Different coverage (non- symmetric for bootstrap), and the classical approach is noticeably wider 2,4,6,6,7,11,13,13,14,15,19,23,24,27,28,28,28,30,31,43

The percentile t bootstrap Another approach would be to create an empirical t distribution Recall the formula for a one-sample t For our purposes here, we will calculate a t, 1000 times, as follows. With each mean and standard deviation of 1 of those 1000 samples, calculate

The percentile t bootstrap This would give us a t distribution with 1000 t scores What we would now do for a confidence interval is find the exact t corresponding to the appropriate quantiles (e.g..025,.975), and use those to calculate a CI using the original sample statistics

Confidence Intervals So what we have done is, instead of assuming some sampling distribution of a particular shape and size, we’ve created it ourselves and derived our interval estimate from it Simulations have shown that this approach is preferable for maintaining type I error with larger samples in which the normality assumption may be untenable.

Independent Groups Comparing independent groups Step 1 compute the bootstrap mean and bootstrap sd as before, but for each group Each time you do so, calculate T* This again creates your own t distribution.

Hypothesis Testing Use the quantile points corresponding to your confidence level in computing your confidence interval on the difference between means, rather than the t cv from typical distributions Note however that your T* will not be the same for the upper and lower bounds – Unless your bootstrap distribution was perfectly symmetrical – Not likely to happen, so…

Hypothesis Testing One can obtain ‘symmetric’ intervals Instead of using the value obtained in the numerator (mean-mu) or (diff b/t means – mu1-mu2), use its absolute value Then apply the standard + formula This may in fact be the best approach for most situations

Extension We can incorporate robust measures of location rather than means – Eg. Trimmed means With a program like R it is very easy to do both bootstrapping and with robust measures using Wilcox’s libraries – – Put the Rallfun files (most recent) in your version 2.x main folder and ‘source’ them, then you’re ready to start using such functionality E.g. source(“C:/R/Rallfunv1.v5”) Example code on last slide The general approach can also be extended to more than 2 groups, correlation, and regression

So why use? Accuracy and control of type I error rate – As opposed to just assuming that it’ll be ok Most of the problems associated with both accuracy and maintenance of type I error rate are reduced using bootstrap methods compared to Student’s t Wilcox goes further to suggest that there may be in fact very few situations, if any, in which the traditional approach offers any advantage over the bootstrap approach The problem of outliers and the basic statistical properties of means and variances as remain however

Example independent samples t- test in R source("Rallfunv1-v7") source("Rallfunv2-v7") y=c(1,1,2,2,3,3,4,4,5,7,9) z=c(1,3,2,3,4,4,5,5,7,10,22) t.test(y,z, alpha=.05) yuenbt(y,z,tr=.20,alpha=.05,nboot=600,side=T)