9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping

Slides:



Advertisements
Similar presentations
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Sampling Distribution Models.
Advertisements

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Economics 105: Statistics Go over GH 11 & 12 GH 13 & 14 due Thursday.
Introduction to Statistics
Evaluating Hypotheses
Chapter 25 Asking and Answering Questions About the Difference Between Two Population Means: Paired Samples.
Chapter 11: Inference for Distributions
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
15-1 Introduction Most of the hypothesis-testing and confidence interval procedures discussed in previous chapters are based on the assumption that.
5-3 Inference on the Means of Two Populations, Variances Unknown
The Sampling Distribution Introduction to Hypothesis Testing and Interval Estimation.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 14: Non-parametric tests Marshall University Genomics.
Bootstrapping applied to t-tests
7.1 Lecture 10/29.
Nonparametric or Distribution-free Tests
Chapter Ten Introduction to Hypothesis Testing. Copyright © Houghton Mifflin Company. All rights reserved.Chapter New Statistical Notation The.
AP Statistics Section 13.1 A. Which of two popular drugs, Lipitor or Pravachol, helps lower bad cholesterol more? 4000 people with heart disease were.
 We cannot use a two-sample t-test for paired data because paired data come from samples that are not independently chosen. If we know the data are paired,
Hypothesis Testing:.
Section #4 October 30 th Old: Review the Midterm & old concepts 1.New: Case II t-Tests (Chapter 11)
+ DO NOW What conditions do you need to check before constructing a confidence interval for the population proportion? (hint: there are three)
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
More About Significance Tests
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Bootstrapping (And other statistical trickery). Reminder Of What We Do In Statistics Null Hypothesis Statistical Test Logic – Assume that the “no effect”
PARAMETRIC STATISTICAL INFERENCE
CHAPTER 18: Inference about a Population Mean
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.2.
A Broad Overview of Key Statistical Concepts. An Overview of Our Review Populations and samples Parameters and statistics Confidence intervals Hypothesis.
Copyright © 2009 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
AP Statistics Section 13.1 A. Which of two popular drugs, Lipitor or Pravachol, helps lower bad cholesterol more? 4000 people with heart disease were.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.3 Estimating a Population Mean.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Nonparametric Tests IPS Chapter 15 © 2009 W.H. Freeman and Company.
Sampling Distribution Models Chapter 18. Toss a penny 20 times and record the number of heads. Calculate the proportion of heads & mark it on the dot.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
1 Chapter 8 Introduction to Hypothesis Testing. 2 Name of the game… Hypothesis testing Statistical method that uses sample data to evaluate a hypothesis.
Chapter 15 – Analysis of Variance Math 22 Introductory Statistics.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
9.3/9.4 Hypothesis tests concerning a population mean when  is known- Goals Be able to state the test statistic. Be able to define, interpret and calculate.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Estimating with Confidence Section 11.1 Estimating a Population Mean.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
+ Unit 6: Comparing Two Populations or Groups Section 10.2 Comparing Two Means.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Copyright © Cengage Learning. All rights reserved. 15 Distribution-Free Procedures.
Non-parametric Approaches The Bootstrap. Non-parametric? Non-parametric or distribution-free tests have more lax and/or different assumptions Properties:
Modern Approaches The Bootstrap with Inferential Example.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
+ Chapter 8 Estimating with Confidence 8.1Confidence Intervals: The Basics 8.2Estimating a Population Proportion 8.3Estimating a Population Mean.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Tests of Significance We use test to determine whether a “prediction” is “true” or “false”. More precisely, a test of significance gets at the question.
Nonparametric statistics. Four levels of measurement Nominal Ordinal Interval Ratio  Nominal: the lowest level  Ordinal  Interval  Ratio: the highest.
Review Statistical inference and test of significance.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Non-parametric Tests Research II MSW PT Class 8. Key Terms Power of a test refers to the probability of rejecting a false null hypothesis (or detect a.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Estimating standard error using bootstrap
When we free ourselves of desire,
Some Nonparametric Methods
Presentation transcript:

9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping

9 Mar 2007 Lec 5b EMBnet Course – Introduction to Statistics for Biologists Hypothesis testing review  2 ‘competing theories’ regarding a population parameter: –NULL hypothesis H (‘straw man’) –ALTERNATIVE hypothesis A (‘claim’, or theory you wish to test)  H: NO DIFFERENCE –any observed deviation from what we expect to see is due to chance variability  A: THE DIFFERENCE IS REAL

9 Mar 2007 Lec 5b EMBnet Course – Introduction to Statistics for Biologists Test statistic  Measure how far the observed data are from what is expected assuming the NULL H by computing the value of a test statistic (TS) from the data  The particular TS computed depends on the parameter  For example, to test the population mean, the TS is the sample mean (or standardized sample mean)

9 Mar 2007 Lec 5b EMBnet Course – Introduction to Statistics for Biologists Testing a population mean  We have already learned how to test the mean of a population for a variable with a normal distribution when the sample size is small and the population SD is unknown  What test is this??

9 Mar 2007 Lec 5b EMBnet Course – Introduction to Statistics for Biologists t-test assumption of normality  The t-test was developed for samples that have normally distributed values  This is an example of a parametric test – a (parametric) form of the distribution is assumed (here, a normal distribution)  The t-test is fairly robust against departures from normality if the sample size is not too small  BUT if the values are extremely non-normal, it might be better to use a procedure which does not make this assumption

9 Mar 2007 Lec 5b EMBnet Course – Introduction to Statistics for Biologists Nonparametric hypothesis tests  Nonparametric (or distribution-free) hypothesis tests do not make assumptions about the form of the distribution of the data values  These tests are usually based on the ranks of the values, rather than the actual values themselves  There are nonparametric analogues of many parametric test procedures

9 Mar 2007 Lec 5b EMBnet Course – Introduction to Statistics for Biologists One-sample Wilcoxon test  Nonparametric alternative to the t-test  Tests value of the center of a distribution  Based on sum of the (positive or negative) ranks of the differences between observed and expected center  Test statistic corresponds to selecting each number from 1 to n with probability ½ and calculating the sum  In R: wilcox.test()

9 Mar 2007 Lec 5b EMBnet Course – Introduction to Statistics for Biologists Two-sample Wilcoxon test  Nonparametric alternative to the 2-sample t-test  Tests for differences in location (center) of 2 distributions  Based on replacing the data values by their ranks (without regard to grouping) and calculating the sum of the ranks in a group  Corresponds to sampling n 1 values without replacement from 1 to n 1 + n 2  In R: wilcox.test()

9 Mar 2007 Lec 5b EMBnet Course – Introduction to Statistics for Biologists Matched-pairs Wilcoxon  Nonparametric alternative to the paired t-test  Analogous to paired t-test, same as one- sample Wilcoxon but on the differences between paired values  In R: wilcox.test()

9 Mar 2007 Lec 5b EMBnet Course – Introduction to Statistics for Biologists ANOVA and the Kruskal-Wallis test  Nonparametric alternative to one-way ANOVA  Mechanics similar to 2-sample Wilcoxon test  Based on between group sum of squares calculated from the average ranks  In R: kruskal.test()

9 Mar 2007 Lec 5b EMBnet Course – Introduction to Statistics for Biologists Issues in nonparametric testing  Some (mistakenly) assume that using a nonparametric test means that you don’t make any assumptions at all  THIS IS NOT TRUE!!  In fact, there is really only one assumption that you are relaxing, and that is of the form that the distribution of sample values takes  A major reason that nonparametric tests are avoided if possible is their relative lack of power compared to (appropriate) parametric tests

9 Mar 2007 Lec 5b EMBnet Course – Introduction to Statistics for Biologists Parameter estimation  Have an unknown population parameter of interest  Want to use a sample to make a guess (estimate) for the value of the parameter  Point estimation : Choose a single value (a ‘point’) to estimate the parameter value  Methods of point estimation include: ML, MOM, Least squares, Bayesian methods...  (Confidence) Interval estimation : Use the data to find a range of values (an interval) that seems likely to contain the true parameter value

9 Mar 2007 Lec 5b EMBnet Course – Introduction to Statistics for Biologists CI mechanics  When the CLT applies, a CI for the population mean looks like sample mean +/- z*  /  n, where z is a number from the standard normal chosen so the confidence level is a specified size (e.g. 95%, 90%, etc.)  For small samples from a normal distribution, use CI based on t-distribution sample mean +/- t* s/  n

9 Mar 2007 Lec 5b EMBnet Course – Introduction to Statistics for Biologists Example  To set a standard for what is to be considered a ‘normal’ calcium reading, a random sample of 100 apparently healthy adults is obtained. A blood sample is drawn from each adult. The variable studied is X = number of mg of calcium per dl of blood. –sample mean = 9.5 –sample SD = 0.5  Find an approximate 95% CI for the (population) average number of mg of calcium per dl of blood...

9 Mar 2007 Lec 5b EMBnet Course – Introduction to Statistics for Biologists Russian dolls analogy*  Père Noël dolls... Outermost is ‘doll 0’, next is ‘doll 1’, etc.  We are not allowed to observe doll 0, which represents the population in a sampling scheme)  Want to estimate some characteristic of doll 0 (e.g. number of points on the beard)  Key assumption : the relationship (e.g. ratio) between dolls 1 and 2 is the same as that between dolls 0 and 1 * from The Bootstrap and Edgeworth Expansion, by Peter Hall, Springer 1992

9 Mar 2007 Lec 5b EMBnet Course – Introduction to Statistics for Biologists From dolls to statistics  Say you want to estimate some function of a population distribution – e.g. the population mean  It makes sense, when possible, to use the same function of the sample distribution  We can do this same thing for many other types of functions  A common example is that we might wish to obtain the sampling distribution of an estimator in order to make a CI, say, in cases where large sample approximations might not hold

9 Mar 2007 Lec 5b EMBnet Course – Introduction to Statistics for Biologists An idea  Where exact calculations are difficult to obtain, they may be approximated by resampling from the observed distribution of sample values  That is, pretend that the sample is the ‘population’  The bootstrap procedure is to draw some number (R ) of samples with replacement from the ‘bootstrap population’ (i.e. the original sample values)  You need a computer to do this!

9 Mar 2007 Lec 5b EMBnet Course – Introduction to Statistics for Biologists Bootstrap procedure  For each bootstrap sample, compute the value of the desired statistic  At the end, you will have R values of the statistic  You can use standard data summary procedures to summarize or explore the distribution of the statistic (histogram, QQ plot, compute the mean, SD, etc.)  For example, to make a bootstrap CI for the sample mean based on the normal distribution, you could use the bootstrap SD (instead of the sample SD)...

9 Mar 2007 Lec 5b EMBnet Course – Introduction to Statistics for Biologists Versions of the bootstrap  Nonparametric Bootstrap : as just described, draw bootstrap samples from the original data  Parametric Bootstrap : assume that your original data came from some particular distribution (for example, a normal distribution, or exponential, etc.)  In this case, samples are simulated from that assumed distribution  Distribution parameters (for example, the mean and SD for the normal) are estimated from the original sample

9 Mar 2007 Lec 5b EMBnet Course – Introduction to Statistics for Biologists R: bootstrap demo  You will have some practice with this in the TP  Let’s go to the demo...demo