Copyright (c) Bani Mallick1 STAT 651 Lecture 10. Copyright (c) Bani Mallick2 Topics in Lecture #10 Comparing two population means using rank tests Comparing.

Slides:



Advertisements
Similar presentations
Happiness comes not from material wealth but less desire. 1.
Advertisements

Sta220 - Statistics Mr. Smith Room 310 Class #14.
Copyright © 2010 Pearson Education, Inc. Slide
Lecture 6 Outline – Thur. Jan. 29
One sample T Interval Example: speeding 90% confidence interval n=23 Check conditions Model: t n-1 Confidence interval: 31.0±1.52 = (29.48, 32.52) STAT.
Copyright (c) Bani Mallick1 Stat 651 Lecture 5. Copyright (c) Bani Mallick2 Topics in Lecture #5 Confidence intervals for a population mean  when the.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.
Copyright (c) Bani Mallick1 Lecture 2 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #2 Population and sample parameters More on populations.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.
Copyright (c) Bani Mallick1 STAT 651 Lecture 7. Copyright (c) Bani Mallick2 Topics in Lecture #7 Sample size for fixed power Never, ever, accept a null.
MARE 250 Dr. Jason Turner Hypothesis Testing II To ASSUME is to make an… Four assumptions for t-test hypothesis testing: 1. Random Samples 2. Independent.
MARE 250 Dr. Jason Turner Hypothesis Testing II. To ASSUME is to make an… Four assumptions for t-test hypothesis testing:
SADC Course in Statistics Comparing Means from Independent Samples (Session 12)
Independent t-Test CJ 526 Statistical Analysis in Criminal Justice.
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #16.
Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #20.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.
Regression Diagnostics Checking Assumptions and Data.
Copyright © 2010 Pearson Education, Inc. Chapter 24 Comparing Means.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture # 12.
Student’s t statistic Use Test for equality of two means
Copyright (c) Bani Mallick1 STAT 651 Lecture # 11.
Copyright (c) Bani K. mallick1 STAT 651 Lecture #14.
5-3 Inference on the Means of Two Populations, Variances Unknown
Copyright (c)Bani K. Mallick1 STAT 651 Lecture #21.
CHAPTER 19: Two-Sample Problems
Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Objective: To test claims about inferences for two sample means, under specific conditions.
Week 9 October Four Mini-Lectures QMM 510 Fall 2014.
Power and Sample Size IF IF the null hypothesis H 0 : μ = μ 0 is true, then we should expect a random sample mean to lie in its “acceptance region” with.
Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :
Chapter 24: Comparing Means.
Variance-Test-1 Inferences about Variances (Chapter 7) Develop point estimates for the population variance Construct confidence intervals for the population.
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 23, Slide 1 Chapter 23 Comparing Means.
Inference for Regression
Inferences for Regression
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 24 Comparing Means.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 24 Comparing Means.
Analysis of variance Petter Mostad Comparing more than two groups Up to now we have studied situations with –One observation per object One.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
Nonparametric Tests IPS Chapter 15 © 2009 W.H. Freeman and Company.
5.1 Chapter 5 Inference in the Simple Regression Model In this chapter we study how to construct confidence intervals and how to conduct hypothesis tests.
Independent t-Test CJ 526 Statistical Analysis in Criminal Justice.
AP Statistics Chapter 24 Comparing Means.
Chapter Twelve The Two-Sample t-Test. Copyright © Houghton Mifflin Company. All rights reserved.Chapter is the mean of the first sample is the.
The Independent- Samples t Test Chapter 11. Quick Test Reminder >One person = Z score >One sample with population standard deviation = Z test >One sample.
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
Chapter 10 The t Test for Two Independent Samples
Comparing Two Means Chapter 9. Experiments Simple experiments – One IV that’s categorical (two levels!) – One DV that’s interval/ratio/continuous – For.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture 6.
T Test for Two Independent Samples. t test for two independent samples Basic Assumptions Independent samples are not paired with other observations Null.
Handout Seven: Independent-Samples t Test Instructor: Dr. Amery Wu
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
Comparing Means Chapter 24. Plot the Data The natural display for comparing two groups is boxplots of the data for the two groups, placed side-by-side.
Copyright (c) Bani Mallick1 STAT 651 Lecture 8. Copyright (c) Bani Mallick2 Topics in Lecture #8 Sign test for paired comparisons Wilcoxon signed rank.
Non-parametric Approaches The Bootstrap. Non-parametric? Non-parametric or distribution-free tests have more lax and/or different assumptions Properties:
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 23, Slide 1 Chapter 24 Comparing Means.
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
Statistics 24 Comparing Means. Plot the Data The natural display for comparing two groups is boxplots of the data for the two groups, placed side-by-side.
Chapter 9 Introduction to the t Statistic
CHAPTER 21: Comparing Two Means
Presentation transcript:

Copyright (c) Bani Mallick1 STAT 651 Lecture 10

Copyright (c) Bani Mallick2 Topics in Lecture #10 Comparing two population means using rank tests Comparing two population variances using Levene’s test The effect of outliers

Copyright (c) Bani Mallick3 Book Sections Covered in Lecture #10 Chapter 6.3 (Wilcoxon Test) Page 368 (Levene’s test, although it is called Levine’s test): This is slightly different from what SPSS does The material on outliers is from my own notes

Copyright (c) Bani Mallick4 Lecture 9 Review: Comparing Two Populations Difference of sample means The s.d. from repeated sampling is You need reasonably large samples from BOTH populations

Copyright (c) Bani Mallick5 Lecture 9 Review: Comparing Two Populations If you can reasonably believe that the population sd’s are nearly equal, it is customary to pick the equal variance assumption and estimate the common standard deviation by

Copyright (c) Bani Mallick6 Lecture 9 Review: Comparing Two Populations The standard error then of is the value The number of degrees of freedom is

Copyright (c) Bani Mallick7 Lecture 9 Review: Comparing Two Populations A (1  100% CI for is Note how the sample sizes determine the CI length

Copyright (c) Bani Mallick8 Lecture 9 Review: Comparing Two Populations The CI can of course be used to test hypotheses This is the same as So we just need to check whether 0 is in the interval, just as we have done

Copyright (c) Bani Mallick9 Lecture 9 Review: Comparing Two Populations Generally, you should make your sample sizes nearly equal, or at least not wildly unequal. Consider a total sample size of 100 = 1 if n 1 = 1, n 2 = 99 = 0.20 if n 1 = 50, n 2 = 50 Thus, in the former case, your CI would be 5 times longer!

Copyright (c) Bani Mallick10 Lecture 9 Review: NHANES Comparison Mean(Healthy) – Mean(Cancer) The 95% CI is from to = Hypothesized value Confidence Interval

Copyright (c) Bani Mallick11 Arsenic and Squamous Cell Skin Cancer The question is whether arsenic ingestion is related to squamous call carcinoma We used the transformation X = log( toe arsenic)

Copyright (c) Bani Mallick12 Arsenic and Squamous Cell Skin Cancer Healthy Cancer

Copyright (c) Bani Mallick13 Arsenic and Squamous Cell Skin Cancer

Copyright (c) Bani Mallick14 Arsenic and Squamous Cell Skin Cancer, Healthy Cases

Copyright (c) Bani Mallick15 Arsenic and Squamous Cell Skin Cancer: Cancer Cases

Copyright (c) Bani Mallick16 Arsenic and Squamous Cell Skin Cancer Healthy, s = 0.59, IQR = 0.69 Squamous, s = 0.62, IQR = 0.71 These statistics and box plots indicate that the two populations do not have vastly different variability.

Copyright (c) Bani Mallick17 Arsenic and Squamous Cell Skin Cancer Healthy: mean = -2.33, n = 215, se = Squamous: mean = , n = 140, se = Mean difference = 0.020, se = % CI= [-0.109, 0.149] p = 0.76: what does this mean?

Copyright (c) Bani Mallick18 Arsenic and Squamous Cell Skin Cancer Graphs, statistics, CI, p-value, all tell us that not much seems to be going on!

Copyright (c) Bani Mallick19 Robust Inference via Rank Tests Because sample means and standard deviations are sensitive to outliers, so too are comparisons of populations based on them Rank tests form a robust alternative, that can be used to check the results of t-statistic inferences You are looking for major discrepancies, and then trying to explain them

Copyright (c) Bani Mallick20 Robust Inference via Rank Tests Rank tests are very easy to compute, and SPSS provides them. Typically called the Wilcoxon rank sum test The algorithm is to assign ranks to each observation in the pooled data set Then apply a t-test to these ranks Robust because ranks can never get wild

Copyright (c) Bani Mallick21 Robust Inference via Rank Tests Here is how data are ranked Data # # Ranks # # Now run a t-test

Copyright (c) Bani Mallick22 Robust Inference via Rank Tests The rank tests give the same answer no matter whether you take the raw data, their logarithms or their square roots. If you have data (raw or transformed) that pass q-q plots tests, then Wilcoxon and t-test should have much the same p-values In this case, you can use the latter to get CI’s

Copyright (c) Bani Mallick23 Robust Inference via Rank Tests Differences between rank and t-tests occur for two reasons generally: outliers and very non-bell shaped histograms

Copyright (c) Bani Mallick24 Robust Inference via Rank Tests In SPSS, you can get Wilcoxon rank sum tests as follows (SPSS calls them Mann-Whitney U) “Analyze”, “Nonparametric Tests”, “2 independent samples”

Copyright (c) Bani Mallick25 Robust Inference via Rank Tests Toe Arsenic log( Toe Arsenic) Note how p-values are the same (= 0.468) Test Statistics a Mann-Whitney U Wilcoxon W Z Asymp. Sig. (2-tailed) log( Toe Arsenic)Toe Arsenic Grouping Variable: Squamous Cancer Status a.

Copyright (c) Bani Mallick26 Robust Inference via Rank Tests, NHANES Saturated Fat p-values: t-test = 0.057, rank test = Log(Saturated Fat): t-test = 0.012, rank test = Note how the transform, which is more bell- shaped, agrees more closely with the rank test!

Copyright (c) Bani Mallick27 Robust Inference via Rank Tests An SPSS peculiarity: to do rank tests, you need to have defined a numeric variable that categorizes the groups. The alternative is to convert the data to numbers and then give value labels.

Copyright (c) Bani Mallick28 Inference for Equality of Variances We have described situations that comparing variability of populations is desired. Ott and Longnecker (Chapter 7) give methods for comparing population variances NEVER USE THESE METHODS They are notoriously unreliable, affected by outliers, non-perfectly bell shaped, etc.

Copyright (c) Bani Mallick29 Inference for Equality of Variances SPSS uses what is called Levene’s test From the SPSS Help file (slightly edited) Levene Test For each case, it computes the absolute difference between the value of that case and its cell mean and performs a t-test on those absolute differences.

Copyright (c) Bani Mallick30 Inference for Equality of Variances Levene Test For each case, it computes the absolute difference between the value of that case and its cell mean and performs a t-test on those absolute differences. This is a reasonable test, although I prefer to use a rank test instead of the t-test

Copyright (c) Bani Mallick31 Inference for Equality of Variances I suggest that you supplement the Levene test with a look at the IQR in boxplots If you really need to understand scientifically the question of equality of variance, I suggest that you consult a bona-fide statistician I’ll now illustrate Levene’s test using NHANES (and this is the last time for these data)

Copyright (c) Bani Mallick32 Inference for Equality of Variances: Note the outlier

Copyright (c) Bani Mallick33 Inference for Equality of Variances

Copyright (c) Bani Mallick34 Inference for Equality of Variances P-value of Levene’s test for Saturated Fat = Same P-value, but without the outlier = P-value for log(Saturated Fat) =.667 P-Value for Levene’s test for Saturated Fat when using the rank test instead of the t-test = P-Value for Levene’s test for log(Saturated Fat) when using the rank test instead of the t-test = 0.665

Copyright (c) Bani Mallick35 Inference for Equality of Variances As you can see, the rank test version of Levene’s test gives answers much more in keeping with the box plots The problem was clearly the outlier, so you can expect trouble with Levene’s test if there is a massive outlier Remember, t-tests have trouble with outliers

Copyright (c) Bani Mallick36 Inference for Equality of Variances As you can see, the rank test version of Levene’s test gives answers much more in keeping with the box plots The problem is that it’s a pain to compute the rank test version in SPSS However, theory says that the rank test version is the better, so in exams I’ll give it to you.

Copyright (c) Bani Mallick37 The Effect of an Outlier What will happen to the sample mean For cancer cases if I remove the outlier?

Copyright (c) Bani Mallick38 The effect of anoutlier What will happen to the sample mean for the cancer cases if I remove the outlier? It will decrease What will happen to the sample standard error for the cancer cases if I remove the outlier?

Copyright (c) Bani Mallick39 The effect of anoutlier What will happen to the difference between the sample mean for healthy cases and the same mean for cancer cases, if I delete the outlier? What will happen to the sample standard error for the cancer cases if I remove the outlier? It will decrease

Copyright (c) Bani Mallick40 The effect of anoutlier What will happen to the difference between the sample mean for healthy cases and the same mean for cancer cases, if I delete the outlier? It will increase What will happen to the sample standard error of this difference if I remove the outlier?

Copyright (c) Bani Mallick41 The effect of anoutlier What will happen to the difference between the sample mean for healthy cases and the same mean for cancer cases, if I delete the outlier? It will increase What will happen to the sample standard error of this difference if I remove the outlier? It will decrease Therefore, what will happen to the p-value if I delete the outlier?

Copyright (c) Bani Mallick42 The effect of anoutlier What will happen to the difference between the sample mean for healthy cases and the same mean for cancer cases, if I delete the outlier? It will increase What will happen to the sample standard error of this difference if I remove the outlier? It will decrease Therefore, what will happen to the p-value if I delete the outlier? It will get smaller

Copyright (c) Bani Mallick43 The effect of anoutlier With the outlier, p = I remove the outlier, p = Therefore, what will happen to the p-value if I delete the outlier? It will get smaller