# Mann Whitney U For comparison data. Using Mann Whitney U Non-parametric i.e. no assumptions are made about data fitting a normal distribution Is used.

## Presentation on theme: "Mann Whitney U For comparison data. Using Mann Whitney U Non-parametric i.e. no assumptions are made about data fitting a normal distribution Is used."— Presentation transcript:

Mann Whitney U For comparison data

Using Mann Whitney U Non-parametric i.e. no assumptions are made about data fitting a normal distribution Is used to compare the medians of two sets of data It measures the overlap between the two data sets You must have between 6 and 20 replicates of data The data sets can have unequal numbers of replicates

Example of normally distributed data

When to use Mann-Whitney U-test Curve not normally distributed ie. non parametric Compares overlap between two data sets

The Equation U1U1 = n 1 x n 2 + ½ n 2 (n 2 + 1) -  R 2 U2U2 = n 1 x n 2 + ½ n 1 (n 1 + 1) -  R 1

The Equation U1U1 = n 1 x n 2 + ½ n 2 (n 2 + 1) -  R 2 U2U2 = n 1 x n 2 + ½ n 1 (n 1 + 1) -  R 1 Where: U1U1 =Mann - Whitney U for data set 1 n1n1 =Sample size of data set 1  R1 R1 =Sum of the ranks of data set 1 U2U2 =Mann - Whitney U for data set 2 n2n2 =Sample size of data set 2  R2 R2 =Sum of the ranks of data set 2

1. Establish the Null Hypothesis H 0 (this is always the negative form. i.e. there is no significant correlation between the variables) and the alternative hypothesis (H 1 ). Method H 0 - There is no significant difference between the variable at Site 1 and Site 2 H 1 - There is a significant difference between the variable at Site 1 and Site 2

2. Copy your data into the table below as variable x and variable y and label the data sets Rank 1 R 1 Data Set 1 Beech Hill (m) 23212320242522 Data Set 2 Rushey Plain (m) 161819172021 Rank 2 R 2

Rank 1 R 1 Data Set 1 Beech Hill (m) 23212320242522 Data Set 2 Rushey Plain (m) 161819172021 Rank 2 R 2 Start from the lowest and put the numbers in order: 16, 17, 18, 19, 20, 20, 21, 21, 22, 23, 23, 24, 25 3. Treat both sets of data as one data set and rank them in increasing order (the lowest data value gets the lowest rank)

1234 1617181920 21 2223 2425 When you have data values of the same value, they must have the same rank. Take the ranks you would normally assign (5 and 6) and add them together (11) and divide the ranks between the data values(5.5) 1 1617181920 21 2223 2425 The lowest data value gets a rank of 1 56 15.5 1617181920 21 2223 2425 The same thing is done for all data values that are the same

When you have data values of the same value, they must have the same rank. Take the ranks you would normally assign (5 and 6) and add them together (11) and divide the ranks between the data values(5.5) The lowest data value gets a rank of 1 56781011 12345.5 7.5 9 10.5 1213 1617181920 21 2223 2425 The assigned ranks can then be put into the table

Rank 1 R 1 10.57.510.55.512139 Data Set 1 Beech Hill (m) 23212320242522 Data Set 2 Rushey Plain (m) 161819172021 Rank 2 R 2 13425.57.5 4. Sum the ranks for each set of data (  R)  R 1 = 10.5 + 7.5 + 10.5 + 5.5 + 12 + 13 + 9 = 68  R 2 = 1 + 3 + 4 + 2 + 5.5 + 7.5 = 23

5. Calculate the number of samples in each data set (n) Count the number of samples in each of the data sets Rank 1 R 1 10.57.510.55.512139 Data Set 1 Beech Hill (m) 23212320242522 Data Set 2 Rushey Plain (m) 161819172021 Rank 2 R 2 13425.57.5 n 1 = 7 n 2 = 6

U1U1 = n 1 x n 2 + ½ n 2 (n 2 + 1) -  R 2 U2U2 = n 1 x n 2 + ½ n 1 (n 1 + 1) -  R 1 It is a good idea to break the equations down into three bite size chunks that will then give you a very easy three figure sum U1U1 =n 1 x n 2 +½ n 2 (n 2 + 1)-  R 2 U2U2 =n 1 x n 2 +½ n 1 (n 1 + 1)-  R 1 6. Calculate the Values for U 1 and U 2 using the equations

U1U1 = n 1 x n 2 + ½ n 2 (n 2 + 1) -  R 2 (7x6) +3(6+1) -23 (7x6) +3(7) -23 U 1 = 42 +21 -23 = 40 U 2 = 42 +28 -68 = 2 (7x6) +3.5(8) -68 (7x6) +3.5(7+1) -68 U2U2 = n 1 x n 2 + ½ n 1 (n 1 + 1) -  R 1

The smallest U value isU 2 = 2 6. Compare the smallest U value against the table of critical values

Value of n 2 123456789101112131415 n1n1 1 200001111 301122334455 40123445678910 501235678911121314 612356810111314161719 7135681012141618202224 802468101315171922242629 9024710121517202326283134 10035811141720232629333639 11036913161923263033374044 121471114182226293337414549 131481216202428333741455054 141591317222631364045505559 1515101419242934394449545964 (at the 0.05 or 95% confidence level i.e. we are 95% confident our data was not due to chance) We us the values of n 1 and n 2 to find our critical value

Is 2 (our smallest U value) smaller or larger than 6 (our critical value from the Mann Whitney Table)? Smaller The smallest U value is less than the critical value; therefore the null hypothesis is rejected The alternative Hypothesis can be accepted – There is a significant difference between the tree heights of Beech Hill and Rushey Plain

Use the following data to calculate U values independently Rank 1 R 1 Velocity cm.s -1 Pools 12 56149208 Velocity cm.s -1 Riffles 5455615631476854 Rank 2 R 2 Rank 1 R 1 Abundance of Gammarus pulex Pools 27394320139 Abundance of Gammarus pulex Riffles 851680183563150 Rank 2 R 2 Abundance of Gammarus pulex in pools and riffles of an Exmoor stream

Key questions Is there a significant relationship? Which data value/s would you consider to be anomalous and why? What graph would you use to present this data?

Download ppt "Mann Whitney U For comparison data. Using Mann Whitney U Non-parametric i.e. no assumptions are made about data fitting a normal distribution Is used."

Similar presentations