Randomization/Permutation Tests Body Mass Indices Among NBA & WNBA Players Home Field Advantage in China Soccer League Opponent Effects for 1927 New York.

Slides:



Advertisements
Similar presentations
Independent t -test Features: One Independent Variable Two Groups, or Levels of the Independent Variable Independent Samples (Between-Groups): the two.
Advertisements

Chapter 16 Introduction to Nonparametric Statistics
statistics NONPARAMETRIC TEST
Biomedical Presentation Name: 牟汝振 Teach Professor: 蔡章仁.
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Chapter 9: Inferences for Two –Samples
The Two Factor ANOVA © 2010 Pearson Prentice Hall. All rights reserved.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Lecture 13 – Tues, Oct 21 Comparisons Among Several Groups – Introduction (Case Study 5.1.1) Comparing Any Two of the Several Means (Chapter 5.2) The One-Way.
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter 12 Chi-Square Tests and Nonparametric Tests
BCOR 1020 Business Statistics
Chapter Goals After completing this chapter, you should be able to:
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Student’s t statistic Use Test for equality of two means
5-3 Inference on the Means of Two Populations, Variances Unknown
Chapter 12 Chi-Square Tests and Nonparametric Tests
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
7.1 Lecture 10/29.
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Chapter 15 Nonparametric Statistics
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Chapter 14: Nonparametric Statistics
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Comparing Three or More Means 13.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
NONPARAMETRIC STATISTICS
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
Comparing Two Population Means
Two Sample Tests Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama.
t-tests Quantitative Data One group  1-sample t-test
A Course In Business Statistics 4th © 2006 Prentice-Hall, Inc. Chap 9-1 A Course In Business Statistics 4 th Edition Chapter 9 Estimation and Hypothesis.
Hypothesis Testing A procedure for determining which of two (or more) mutually exclusive statements is more likely true We classify hypothesis tests in.
AP Statistics Chapter 20 Notes
1 Nonparametric Statistical Techniques Chapter 17.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 10 Hypothesis Testing:
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Chapter Outline Goodness of Fit test Test of Independence.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
CD-ROM Chap 16-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition CD-ROM Chapter 16 Introduction.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 24, Slide 1 Chapter 25 Paired Samples and Blocks.
T-TestsSlide #1 2-Sample t-test -- Examples Do mean test scores differ between two sections of a class? Does the average number of yew per m 2 differ between.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
T-tests Quantitative Data One group  1-sample t-test Two independent groups  2-sample t-test Two dependent groups  Matched Pairs t-test t-TestsSlide.
Multinomial Distribution World Premier League Soccer Game Outcomes.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Copyright © Cengage Learning. All rights reserved. 15 Distribution-Free Procedures.
Statistics: Unlocking the Power of Data Lock 5 Section 6.3 Test for a Single Proportion.
Week 101 Test on Pairs of Means – Case I Suppose are iid independent of that are iid. Further, suppose that n 1 and n 2 are large or that are known. We.
Mixture of Normal Distributions Body Mass Indices Among NBA, WNBA, EPL, and NHL Athletes.
Nonparametric Statistics - Dependent Samples How do we test differences from matched pairs of measurement data? If the differences are normally distributed,
Chapter 9 Lecture 3 Section: 9.3. We will now consider methods for using sample data from two independent samples to test hypotheses made about two population.
Chapter 7 Inference Concerning Populations (Numeric Responses)
Lecture 22 Dustin Lueker.  Similar to testing one proportion  Hypotheses are set up like two sample mean test ◦ H 0 :p 1 -p 2 =0  Same as H 0 : p 1.
Statistical Inferences for Variance Objectives: Learn to compare variance of a sample with variance of a population Learn to compare variance of a sample.
Chapter 11: The t Test for Two Related Samples. Repeated-Measures Designs The related-samples hypothesis test allows researchers to evaluate the mean.
1 Nonparametric Statistical Techniques Chapter 18.
1 Underlying population distribution is continuous. No other assumptions. Data need not be quantitative, but may be categorical or rank data. Very quick.
When the means of two groups are to be compared (where each group consists of subjects that are not related) then the excel two-sample t-test procedure.
Chapter 12 Chi-Square Tests and Nonparametric Tests
Comparing Three or More Means
Lesson Inferences about the Differences between Two Medians: Dependent Samples.
Randomization/Permutation Tests
Tests for Two Means – Normal Populations
Nonparametric Statistics
Distribution-Free Procedures
pairing data values (before-after, method1 vs
For a permutation test, we have H0: F1(x) = F2(x) vs
Presentation transcript:

Randomization/Permutation Tests Body Mass Indices Among NBA & WNBA Players Home Field Advantage in China Soccer League Opponent Effects for 1927 New York Yankees

Background Goal: Compare 2 (or More) Treatment Effects or Means based on sample measurements  Independent Samples: Units in different treatment conditions are independent of one another. In controlled experiments they have been randomized to treatments. Observed data are: Y 11,…Y 1n1 and Y 21,…,Y 2n2  Paired Samples: Units are observed under each condition (treatment), and the subsequent difference has been obtained: d j = Y 1j – Y 2j j=1,…,n Procedure: Working under null hypothesis of no differences in treatment effects, how extreme is observed treatment difference relative to many (in theory all) possible randomizations/permutations of the observed data to the treatment labels.

Independent Samples – 2 Treatments Algorithm: o Compute Test Statistic for Observed Data and save o Obtain large number of permutations (N) of observed values to treatment labels o For each permutation, compute the Test Statistic and save o P-value = (# Permuted TS ≥ Observed TS)/(N+1)

Example – NBA and WNBA Players’ BMI Groups: Male: NBA(i=1) and Female: WNBA(i=2) Samples: Random Samples of n 1 = n 2 = 20 from 2013 seasons (2013/2014 for NBA)

Permutation Samples Generate Permutations of the 40 integers using a random number generator (like pulling 1:40 from hat, one-at-a-time without replacement) Assign the first 20 players (based on id) selected to Treatment 1, last 20 to Treatment 2 Compute and save Test Statistic: Continue for many (N total) samples Count number as large or larger than observed Test Statistic (in absolute value, if 2-sided test) P-value obtained as (Count+1)/(N+1)

Permutation Samples (EXCEL) Comments:  Column 4: (Ran1) has smallest number (.01077) corresponding to id=11. Thus player 11 is first player in group 1 in Permutation sample. Next smallest is (id=34)  The “sort” columns (5-8) give the first permutation samples for the 2 groups.  The difference in BMI for groups 1 and 2 in the original sample is  The difference in BMI for groups 1 and 2 in the permutation sample is

R Program ### Download dataset nba.bmi <- read.csv(" header=T) attach(nba.bmi); names(nba.bmi) ### Obtain sample sizes, sample means, and observed Test Statistic (n1 <- length(BMI[Gender==1])); (n2 <- length(BMI[Gender==2])) (ybar1.obs <- mean(BMI[Gender==1])); (ybar2.obs <- mean(BMI[Gender==2])) (TS.obs <- ybar1.obs-ybar2.obs); (n.tot <- n1+n2) ### Choose number of permutations and initialize TS vector to save Test Statistics ### set seed to be able to reproduce permutation samples N <- 9999; TS <- rep(0,N); set.seed(97531) ### Loop through N samples, generating Test Stat each time for (i in 1:N) { perm <- sample(1:n.tot,size=n.tot,replace=F) if (i == 1) print(perm) ybar1 <- mean(BMI[perm[1:n1]]) ### mean BMI of first n1 elements of perm ybar2 <- mean(BMI[perm[(n1+1):(n1+n2)]]) ### mean BMI of next n2 elements of perm TS[i] <- ybar1-ybar2 } ### Count # of cases where abs(TS) >= abs(TS.obs) for 2-sided test and obtain p-value (num.exceed =abs(TS.obs))) (p.val.2sided <- (num.exceed+1)/(N+1)) ### Draw histogram of distribution of TS, with vertical line at TS.obs hist(TS,xlab="Mean1 - Mean2",breaks=seq(-2.5,2.5,0.25), main="Randomization Distribution for BMI") abline(v=TS.obs)

R Output > ### Obtain sample sizes, sample means, and observed Test Statistic > (n1 <- length(BMI[Gender==1])) [1] 20 > (n2 <- length(BMI[Gender==2])) [1] 20 > (ybar1.obs <- mean(BMI[Gender==1])) [1] > (ybar2.obs <- mean(BMI[Gender==2])) [1] > (TS.obs <- ybar1.obs-ybar2.obs) [1] > (n.tot <- n1+n2) [1] 40 ### First permutation of 1:40 [1] [26] > ### Count # of cases where abs(TS) >= abs(TS.obs) for 2-sided test and obtain p-value > (num.exceed =abs(TS.obs))) [1] 121 > (p.val.2sided <- (num.exceed+1)/(N+1)) [1]

Normal t-test (Equal Variances Assumed)

t-test for NBA vs WNBA BMI Note: the Permutation and t-tests give the same P-value to 4 decimal places – ≈Normal Data

Paired Samples Data Consists of n Pairs of Observations (Y 1j,Y 2j ) j=1,…,n Data are on same subject (individuals matched on external criteria) under 2 conditions (often Before/After) Construct the differences: d j = Y 1j - Y 2j The true population mean difference is:  d =  1 –  2 Wish to test H 0 :  d = 0 with a 1-sided or 2-sided alternative

Procedure Compute an observed Test Statistic that measures the treatment effect in some manner (such as the sample mean of the differences) For many randomization samples:  Generate a series of n U(0,1) random variables: U 1,…,U n  If (say) U j < 0.5 set d j* = -d j where d j* is difference for case j in this sample, otherwise, set d j* = d j  Compute the Test Statistic for this sample and save Compare the observed Test Statistic with the sample Test Statistics in a manner similar to Independent Sample Case: Computing the proportion of sample Test Statistics as extreme or more than the observed Test Statistics

Example: English Premier League Football Interested in Determining if there is a home field effect  League has 20 teams, all play all 19 opponents Home and Away (190 pairs of teams, each playing once on each team’s home field). No overtime.  Label teams in alphabetical order: 1=Arsenal, 20=Wigan  Let Y 1jk = (H j -A k ) j < k Differential when j at Home, k is Away  Let Y 2jk = (A j -H k ) j < k Differential when j is Away, k is at Home  d jk = Y 1jk – Y 2jk = (H j +H k ) - (A j +A k ) j < k Note: d represents combined Home Goals – Combined Away Goals for the Pair of teams No home effect should mean  d = 0

Representative Games from the Sample Comments (regarding these 9 pairs, and these 2 samples - Full Analysis next slide):  For the original sample, the Test Statistic is the Average Difference:  For the first random sample, games 1,4,8 had Ran1 < 0.5, and their d jk switched sign. The new sampled test statistic was  For the second random sample, games 1,2,3,5,6,8 had Ran2 < 0.5, and their d jk switched sign. The new sampled test statistic was  The p-value for a 1-tailed (H A :  d > 0) would be p = (1+1)/(2+1) = 2/3 as both the original sample and Ran1 have Test Statistics ≥ The 2-sided is also p = 2/3

R Program epl2012 <- read.csv(" header=T) attach(epl2012); names(epl2012) ### Obtain Sample Size and Test Statistic (Average of d.jk) (n <- length(d.jk)) (TS.obs <- mean(d.jk)) ### Choose the number of samples and initialize TS, and set seed N <- 9999; TS <- rep(0,N); set.seed(86420) ### Loop through samples and compute each TS for (i in 1:N) { ds.jk <- d.jk # Initialize d*.jk = d.jk u <- runif(n)-0.5 # Generate n U(-0.5,0.5)'s u.s 0 ds.jk <- u.s * ds.jk TS[i] <- mean(ds.jk) # Compute Test Statistic for this sample } summary(TS) (num.exceed1 = TS.obs)) # Count for 1-sided (Upper Tail) P-value (num.exceed2 = abs(TS.obs))) # Count for 2-sided P-value (p.val.1sided <- (num.exceed1 + 1)/(N+1)) # 1-sided p-value (p.val.2sided <- (num.exceed2 + 1)/(N+1)) # 2-sided p-value ### Draw histogram of distribution of TS, with vertical line at TS.obs hist(TS,xlab="Mean Home-Away",main="Randomization Distribution for EPL 2012 Home Field Advantage") abline(v=TS.obs)

R Output > > ### Obtain Sample Size and Test Statistic (Average of d.jk) > (n <- length(d.jk)) [1] 190 > (TS.obs <- mean(d.jk)) [1] > > summary(TS) Min. 1st Qu. Median Mean 3rd Qu. Max > (num.exceed1 = TS.obs)) # Count for 1-sided (Upper Tail) P-value [1] 0 > (num.exceed2 = abs(TS.obs))) # Count for 2-sided P-value [1] 0 > (p.val.1sided <- (num.exceed1 + 1)/(N+1)) # 1-sided p-value [1] 1e-04 > (p.val.2sided <- (num.exceed2 + 1)/(N+1)) # 2-sided p-value [1] 1e-04 The observed Mean difference (0.6368) exceeded all 9999 sampled values: (min = , max = ) Thus, both P-values = (0+1)/(9999+1) =.0001

Normal Paired t-test

Paired t-test for EPL 2012 Home vs Away Goals Note: the t-test gives smaller P-value, but Permutation test was limited to number of samples