Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.

Similar presentations


Presentation on theme: "Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics."— Presentation transcript:

1 Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics

2 Copyright © Cengage Learning. All rights reserved. 14.3 The Mann–Whitney U Test

3 3 The Mann–Whitney U test is a nonparametric alternative to the t-test for the difference between two independent means. The usual two-sample situation occurs when the experimenter wants to see whether the difference between the two samples is sufficient to reject the null hypothesis that the two sampled populations are identical.

4 4 Hypothesis-Testing Procedure

5 5 Assumptions for inferences about two populations using the Mann–Whitney U test The two independent random samples are independent within each sample as well as between samples, and the random variables are ordinal or numerical. This test is often used in situations in which the two samples are drawn from the same population of subjects but different “treatments” are used on each set. We will demonstrate the procedure in the next example.

6 6 Example 6 – Two-tailed Hypothesis Test In a large lecture class, when a 1-hour exam is given, the instructor gives two “equivalent” examinations. It is reasonable to ask: Are these two different exams really equivalent? Students in even-numbered seats take exam A, and those in the odd-numbered seats take exam B. To test this “equivalent” hypothesis, two random samples were taken. Table 14.3 lists the exam scores of the two samples. Data on Exam Scores [TA14-03] Table 14.3

7 7 Example 6 – Two-tailed Hypothesis Test If we assume that the odd- or even-numbered seats had no effect, does the sample present sufficient evidence to reject the hypothesis “The exam forms yielded scores that had identical distributions”? Test using  = 0.05. Solution: Step 1 a. Parameter of interest: The distribution of scores for each version of the exam b. Statement of hypotheses: H o : Exam A and exam B have test scores with identical distributions. cont’d

8 8 Example 6 – Solution H a : The two distributions are not the same. Step 2 a. Assumptions: The two samples are independent, and the random variable, exam score, is numerical. b. Test statistic: The Mann–Whitney U statistic c. Level of significance:  = 0.05 Step 3 a. Sample information: The sample data are listed in Table 14.3. Data on Exam Scores [TA14-03] Table 14.3 cont’d

9 9 Example 6 – Solution b. Test statistic. The size of the individual samples will be called n a and n b ; actually, it makes no difference which way these are assigned. In our example they both have the value 10. The two samples are combined into one sample (all n a + n b ) and ordered from smallest to largest: 49 52 56 62 64 65 71 72 74 78 78 80 81 86 88 90 90 90 91 98 cont’d

10 10 Example 6 – Solution Each is then assigned a rank number. The smallest (49) is assigned rank 1, the next smallest (52) is assigned rank 2, and so on, up to the largest, which is assigned rank n a + n b (20). Ties are handled by assigning to each of the tied observations the mean rank of those rank positions that they occupy. For example, in our example there are two 78s; they are the 10 th and 11 th. The mean rank for each is then = 10.5 cont’d

11 11 Example 6 – Solution In the case of the three 90s—the 16 th, 17 th, and 18 th data values—each is assigned 17 because = 17. The rankings are shown in Table 14.4. cont’d Ranked Exam Score Data Table 14.4

12 12 Example 6 – Solution Figure 14.2 shows the relationship between the two sets of data, first by using the data values and second by comparing the rank numbers for the data. Comparing the Data of Two Samples Figure 14.2 cont’d

13 13 Example 6 – Solution The calculation of the test statistic U is a two-step procedure. We first determine the sum of the ranks for each of the two samples. Then, using the two sums of ranks, we calculate a U score for each sample. The smaller U score is the test statistic. The sum of ranks R a for sample A is computed as R a = 1 + 2 + 3 + 5 + 6 + 10.5 + 10.5 + 14 + 17 + 17 = 86 The sum of ranks R b for sample B is R b = 4 + 7 + 8 + 9 + 12 + 13 + 15 + 17 + 19 + 20 = 124 cont’d

14 14 Example 6 – Solution The U score for each sample is obtained by using the following pair of formulas: cont’d

15 15 Example 6 – Solution, the test statistic, is the smaller of U a and U b. For our example, we obtain Therefore = 31. cont’d

16 16 Example 6 – Solution Before we carry out the test for this example, let’s try to understand some of the underlying possibilities. We know that the null hypothesis is that the distributions are the same and that we will most likely want to conclude from this that the averages are approximately equal. Suppose for a moment that the distributions are indeed quite different; say, all of one sample comes before the smallest data value in the second sample when they are ranked together. cont’d

17 17 Example 6 – Solution This would certainly mean that we would want to reject the null hypothesis. What kind of a value can we expect for U in this case? Suppose that the 10 A values had ranks 1 through 10 and that the 10 B values had ranks 11 through 20. Then we would obtain R a = 55 and R b = 155 cont’d

18 18 Example 6 – Solution Therefore = 0 If this were the case, we certainly would want to reach the decision: Reject the null hypothesis. Suppose, on the other hand, that both samples were perfectly matched; that is, a score in each set is identical to one in the other. 54 54 62 62 71 71 72 72... A B A B A B A B... 1.5 1.5 3.5 3.5 5.5 5.5 7.5 7.5... cont’d

19 19 Example 6 – Solution Now what would happen? R a = R b = 105 Therefore, = 50. If this were the case, we certainly would want to reach the decision: Fail to reject the null hypothesis. Note The sum of the two U’s (U a + U b ) will always be equal to the product of the two sample sizes (n a  n b ). For this reason we need concern ourselves only with the smaller U value. cont’d

20 20 Example 6 – Solution Step 4 Probability Distribution: p-Value: a. Since the concern is for values related to “not the same,” the p-value is the probability of both tails. It will be found by finding the probability of the left tail and doubling: P = 2  P(U  31 for n 1 = 10 and n 2 = 10) cont’d

21 21 Example 6 – Solution To find the p-value, you have two options: 1. Use Table 13 in Appendix B to place bounds on the p-value: P > 0.10. 2. Use a computer or calculator to find the p-value: P = 0.1612. b. The p-value is not smaller than . cont’d

22 22 Example 6 – Solution Classical: a.The critical region is two-tailed because H a expresses concern for values related to “not the same.” Use Table 13A for two-tailed  = 0.05. The critical value is at the intersection of column n 1 = 10 and row n 2 = 10:23.The critical region is U  23. cont’d

23 23 Example 6 – Solution b. U is not in the critical region, as shown in the figure. Step 5 a. Decision: Fail to reject H o. b. Conclusion: We do not have sufficient evidence to reject the “equivalent” hypothesis. cont’d

24 24 Hypothesis-Testing Procedure Calculating the p-Value when Using the Mann–Whitney Test Method 1: Use Table 13 in Appendix B to place bounds on the p-value. By inspecting Table 13A and B at the intersection of column n 1 = 10 and row n 2 = 10, you can determine that the p-value is greater than 0.10; the larger two-tailed value of  is 0.10 in Table 13B. Method 2: If you are doing the hypothesis test with the aid of a computer or graphing calculator, most likely it will calculate the p-value for you.

25 25 Normal Approximation

26 26 Normal Approximation If the samples are larger than size 20, we may make the test decision with the aid of the standard normal variable, z. This is possible because the distribution of U is approximately normal with a mean and a standard deviation (14.5) (14.6)

27 27 Normal Approximation The hypothesis test is then completed using the test statistic : The standard normal distribution may be used whenever n a and n b are both greater than 10. (14.7)

28 28 Example 7 – One-tailed Hypothesis Test A dog-obedience trainer is training 27 dogs to obey a certain command. The trainer is using two different training techniques: (I) the reward-and-encouragement method and (II) the no-reward method. Table 14.5 shows the numbers of obedience sessions that were necessary before the dogs would obey the command. Data on Dog Training [TA14-05] Table 14.5

29 29 Example 7 – One-tailed Hypothesis Test Does the trainer have sufficient evidence to claim that the reward method will, on average, require fewer obedience sessions (  = 0.05)? Solution: Step 1 a. Parameter of interest: The distribution of needed obedience sessions for each technique b. Statement of hypotheses: H o : The distributions of the needed obedience sessions are the same for both methods. H a : The reward method, on average, requires fewer sessions. cont’d

30 30 Example 7 – Solution Step 2 a. Assumptions: The two samples are independent, and the random variable, training time, is numerical. b. Test statistic: The Mann–Whitney U statistic c. Level of significance:  = 0.05 Step 3 a. Sample information: The sample data are listed in Table 14.5. cont’d

31 31 Example 7 – Solution b. Test statistic: The two sets of data are ranked jointly, and ranks are assigned as shown in Table 14.6. Rankings for Training Methods Table 14.6 cont’d

32 32 Example 7 – Solution The sums are: R I = 1 + 2 + 3 + 4 + 6.5 + … + 20.5 + 23 = 151.0 R II = 5 + 11.5 +14.5 +… + 26 + 27 = 227.0 The U scores are found using formulas (14.3) and (14.4): cont’d

33 33 Example 7 – Solution Therefore, = 31. Now we use formulas (14.5), (14.6), and (14.7) to determine the z-statistic: cont’d

34 34 Example 7 – Solution Step 4 Probability Distribution: p-Value: a. Use the left-hand tail because H a expresses concern for values related to “fewer than.” P = P(z < –2.88) as shown in figure. cont’d

35 35 Example 7 – Solution To find the p-value, you have three options: 1. Use Table 3 (Appendix B) to calculate the p-value: P = 0.0020. 2. Use Table 5 (Appendix B) to place bounds on the p-value: 0.0019 < P < 0.0022 3. Use a computer or calculator to find the p-value: P = 0.0020 b. The p-value is smaller than . cont’d

36 36 Example 7 – Solution Classical: a. The critical region is the left-hand tail because H a expresses concern for values related to “fewer than.” The critical value is obtained from Table 4A: b. is in the critical region, as shown in red in the figure. cont’d

37 37 Example 7 – Solution Step 5 a. Decision: Reject H o. b. Conclusion: At the 0.05 level of significance, the data show sufficient evidence to conclude that the reward method does, on average, require fewer training sessions. cont’d


Download ppt "Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics."

Similar presentations


Ads by Google