Mann-Whitney U-test Testing for a difference U 1 = n 1 n 2 + ½ n 1 (n 1 + 1) – R 1.

Slides:



Advertisements
Similar presentations
Lecture 8: Hypothesis Testing
Advertisements

Prepared by Lloyd R. Jaisingh
Non-parametric tests, part A:
You will need Your text Your calculator
Elementary Statistics
Chapter 10: The t Test For Two Independent Samples
9.4 t test and u test Hypothesis testing for population mean Example : Hemoglobin of 280 healthy male adults in a region: Question: Whether the population.
Non-parametric statistics
Chi-Square and Analysis of Variance (ANOVA)
Hypothesis Tests: Two Independent Samples
T-test - unpaired Testing for a difference. What does it do? Tests for a difference in means Compares two cases (eg areas of lichen found in two locations)
Using the P-Value Section P-Value (Observed Significance Level)  It’s the measure of the inconsistency between the hypothesized value for a population.
Chi-square and F Distributions
© The McGraw-Hill Companies, Inc., Chapter 10 Testing the Difference between Means and Variances.
ABOUT TWO INDEPENDENT POPULATIONS
Experimental Design and Analysis of Variance
1 Chapter 20: Statistical Tests for Ordinal Data.
Testing Hypotheses About Proportions
One sample means Testing a sample mean against a population mean.
Chapter 13 Comparing Two Populations: Independent Samples.
Objective: To test claims about inferences for two proportions, under specific conditions Chapter 22.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 16 l Nonparametrics: Testing with Ordinal Data or Nonnormal Distributions.
Business Statistics - QBM117
The Kruskal-Wallis Test The Kruskal-Wallis test is a nonparametric test that can be used to determine whether three or more independent samples were.
Statistics 03 Hypothesis Testing ( 假设检验 ). When we have two sets of data and we want to know whether there is any statistically significant difference.
Mann-Whitney and Wilcoxon Tests.
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
(a brief over view) Inferential Statistics.
Traditional method 2 means, σ’s unknown. Scientists studying the effect of diet on cognitive ability are comparing two groups of mice. The first group.
Jump to first page HYPOTHESIS TESTING The use of sample data to make a decision either to accept or to reject a statement about a parameter value or about.
Statistical Analysis Statistical Analysis
Aim: How do we test a comparison group? Exam Tomorrow.
Chi-squared Testing for a difference. What does it do? Compares numbers of people/plants/species… in different categories (eg different pollution levels,
Chi-squared Goodness of fit. What does it do? Tests whether data you’ve collected are in line with national or regional statistics.  Are there similar.
Statistical Analysis A Quick Overview. The Scientific Method Establishing a hypothesis (idea) Collecting evidence (often in the form of numerical data)
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
Chapter 9 Hypothesis Testing II: two samples Test of significance for sample means (large samples) The difference between “statistical significance” and.
Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1.
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
Two-Sample Hypothesis Testing. Suppose you want to know if two populations have the same mean or, equivalently, if the difference between the population.
T-test - paired Testing for a difference. What does it do? Tests for a difference in means Compares two cases (eg soil moisture content north & south.
9.2 Testing the Difference Between Two Means: Using the t Test
11.5 Testing the Difference Between Two Variances
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Chapter 9: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Type I and II Errors Testing the difference between two means.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Nonparametric tests: Tests without population parameters (means and standard deviations)
Understanding Statistics © Curriculum Press 2003     H0H0 H1H1.
Wilcoxon Signed Rank Testing for a difference R+ RR
Chi-squared Association Index. What does it do? Looks for “links” between two factors  Do dandelions and plantains tend to grow together?  Does the.
Correlation – Spearman’s. What does it do? Measures rank correlation – whether highest value in the 1 st data set corresponds to highest in the 2 nd set.
Practice You recently finished giving 5 Villanova students the MMPI paranoia measure. Determine if Villanova students’ paranoia score is significantly.
1 Underlying population distribution is continuous. No other assumptions. Data need not be quantitative, but may be categorical or rank data. Very quick.
Testing for a difference
Testing the Difference Between Two Means
Testing for a difference
Testing for a difference
Hypothesis Tests for 1-Sample Proportion
Hypothesis Testing: Hypotheses
اختبار الفرضيات اختبارالفرضيات المتعلقة بالوسط
Hypothesis tests for the difference between two means: Independent samples Section 11.1.
Hypothesis Tests for Proportions
Chi-squared Association Index
The Rank-Sum Test Section 15.2.
Doing t-tests by hand.
No class on Wednesday 11/1 No class on Friday 11/3
Section 11.1: Significance Tests: Basics
Practice You recently finished giving 5 Villanova students the MMPI paranoia measure. Determine if Villanova students’ paranoia score is significantly.
The z-test for the Mean of a Normal Population
8.2 Day I: Z-Tests for a Mean One Tailed Large Sample
Presentation transcript:

Mann-Whitney U-test Testing for a difference U 1 = n 1 n 2 + ½ n 1 (n 1 + 1) – R 1

What does it do? Tests for a difference in averages (medians – the middle value – to be exact) Compares two cases (eg species diversity in polluted and unpolluted water) The data can be of any kind, provided it’s numerical eg lengths, percentages, numbers of people… The samples do not have to be of the same size

Planning to use it? You want to test for difference You have just two cases to compare You have five or more values from each case If your data are likely to be normally distributed, it may be easier to get a significant result using the t-test Make sure that…

How does it work? You assume (null hypothesis) there is no difference between the two cases The test involves ranking all the data together, then adding up the ranks for each sample. If, for example, the values in the first sample were all much bigger, then the first sample would have higher ranks

Doing the test These are the stages in doing the test: 1.Write down your hypotheseshypotheses 2.Doing the rankingranking 3.Calculating your U-valuesU-values 4.Look at the tablestables 5.Make a decisiondecision Click here Click here for an example

Hypotheses H 0: There is no difference between population 1 and population 2 For H 1, you have a choice, depending on what alternative you were looking for. H 1: Population 1 is larger than population 2 eg: Species diversity in unpolluted water is greater than in polluted water orH 1: Population 1 is different to population 2 eg: Species diversity is different in unpolluted water and polluted water Unless you have a good scientific reason for expecting one to be larger, you should choose “different” for H 1

Ranking We need to put all the data together, and rank it, but remember which sample it’s from. One easy way to do this is to write data from different samples in different colours Give rank 1 to the highest value, rank 2 to the second highest and so on. If there are any ties, give them the average of the ranks they would have had. eg Suppose three pieces of data tie for second place. They would otherwise have been in 2 nd, 3 rd and 4 th place. So give them all the average of 2 nd, 3 rd and 4 th – that’s rank 3.

U-values First work out:  R 1 = sum of ranks for sample 1  R 2 = sum of ranks for sample 2 Then work out  U 1 = n 1 n 2 + ½ n 1 (n 1 + 1) – R 1  U 2 = n 1 n 2 + ½ n 2 (n 2 + 1) – R 2 n 1, n 2 are the sizes of the two samples

Tables This is a Mann-Whitney table You usually have to look in a different table for different significance levels Sizes of the two samples. The bigger one is “n 2 ”

Make a decision If you used: H 1 : Population 1 is larger than population 2:  You are doing a 1-tailed test (1 alternative only considered)  Choose the U-value from the sample you’d expected to be larger (It should be the smaller U-value)  If your U-value is smaller than the tables value, you reject your null hypothesis If you used H 1 : Population 1 is different to population 2:  You are doing a 2-tailed test (both alternatives considered)  Choose the smaller of the two U-values  If your U-value is smaller than the tables value, you reject your null hypothesis

Example: Invertebrates in Long & Short Grass Data were obtained for the number of invertebrates caught in sweep nets at 8 sites in long and short grass. Hypotheses: H 0: There is no difference in the number of invertebrates in long and short grass H 1 There is a difference in the number of invertebrates in long and short grass

The data Site long grass short grass

Ranking We need to put all the data together, and rank it, but remember whether it’s a long or short grass We’ll do this with colours long short We have: 41, 43,34, 37,15, 22, 27, 47, 38, 98, 27, 72, 65 In order: 98, 72, 65, 47, 43, 41, 38, 37, 34, 27, 27, 22, 15 Ranks:

U-Values First find the sum of ranks for long and short grass: Long: = 63.5 Short: = 27.5 Now work out the two U values, using the formulae: U 1 = n 1 n 2 + ½ n 1 (n 1 + 1) – R 1 U 2 = n 1 n 2 + ½ n 2 (n 2 + 1) – R 2 So U 1 = (7)(6) + ½ 7 (7 + 1) – 63.5 = 6.5 (long grass) U 2 = (7)(6) + ½ 6 (6 + 1) – 27.5 = 35.5 (short grass)

The Test Since our H 1 referred to “a difference”, we’re doing the 2-tailed test U = smaller of U 1 and U 2 = 6.5 Critical value (5%) = 6 Our value is larger. So accept H 0 – there is no significant difference between the numbers of invertebrates in long grass and in short grass.