Presentation is loading. Please wait.

Presentation is loading. Please wait.

NONPARAMETRIC METHODS

Similar presentations


Presentation on theme: "NONPARAMETRIC METHODS"— Presentation transcript:

1 NONPARAMETRIC METHODS
CHAPTER 14: NONPARAMETRIC METHODS

2 THE SIGN TEST Tests About Categorical Data
Tests About the Median of a Single Population Tests About the Median Difference Between Paired Data

3 THE SIGN TEST cont. The sign test can be used to perform the following types of tests: To determine the preference for one product or item over another, or to determine whether one outcome occurs more often than another outcome in categorical data To conduct a test for the median of a single population To perform a test for the median of paired differences using data from two dependent samples

4 THE SIGN TEST cont. Definition
The sign test is used to make hypothesis tests about preferences, a single median, and the median of paired differences for two dependent populations. We use only plus and minus signs to perform these tests.

5 Tests About Categorical Data
Data that are divided into different categories for identification purposes are called categorical data The Small-Sample Case A sample is considered small if n ≤ 25

6 Example 14-1 The Top Taste Water Company produces and distributes Top Taste bottled water. The company wants to determine whether customers have a higher preference for its bottled water than for its main competitor, Spring Hill bottled water. The Top Taste Water Company hired a statistician to conduct this study.

7 Example 14-1 The statistician selected a random sample of 10 people and asked each of them to taste one sample of each of the two brands of water. The customers did not know the brand of each water sample. Also, the order in which each person tasted the two brands of water was determined randomly. Each person was asked to indicate which of the two samples of water he or she preferred. The following table shows the preferences of these 10 individuals.

8 Example 14-1 Person Brand Preferred 1 2 3 4 5 6 7 8 9 10 Spring Hill
Top Taste Neither

9 Example 14-1 Based on these results, can the statistician conclude that people prefer one brand of bottled water over the other? Use the significance level of 5%.

10 Solution 14-1 H0: p = .50 People do not prefer either of the two brands of water H1: p ≠ .50 People prefer one brand of water over the other

11 Solution 14-1 We use the binomial probability distribution to make the test There is only one sample Each member is asked to indicate a preference if he or she has one We drop the members who do not indicate a preference Compare the preferences of the remaining members

12 Solution 14-1 There are three outcomes for each person:
Prefers Top Taste water Prefers Spring Hill water Has no preference We are to compare the two outcomes with preferences Determine whether more people belong to one of these two outcomes

13 Solution 14-1 n = 9 The test is two-tailed α = .05
Let X be the number of people in the sample of 9 who prefer Top Taste bottled water X is the test statistic From Table XI, the critical values of X are 1 and 8

14 Figure 14.1 0 or 1 2 to 7 8 or 9 Rejection region Nonrejection region
X

15 Critical Value(s) of X In a sign test of a small sample, the critical value of X is obtained from Table XI. If the test is two-tailed, we read both the lower the upper critical values from that table. However, we read only the lower critical value if the test is left-tailed, and only the upper critical value if the test is right-tailed. Also note that which column we use to obtain this critical value depends on the given significance level and on whether the test is two-tailed or one-tailed.

16 Table 14.1 Person Brand Preferred Sign 1 2 3 4 5 6 7 8 9 10
Spring Hill Top Taste Neither - +

17 Observed Value of X The observed value of X is given by the number of signs that belong to the category whose proportion we are testing for.

18 Solution 14-1 The observed value of X = 6
It falls in the nonrejection region Therefore, we fail to reject H0

19 Tests About Categorical Data cont.
The Large-Sample Case If n > 25, the normal distribution can be used as an approximation to the binomial probability distribution to perform a test of hypothesis about the preference for categorical data. The observed value of the test statistic z, in this case, is calculated as

20 The Large-Sample Case cont.
where X is the number of units in the sample that belong to the outcome referring to p. We either add .5 to X or subtract .5 from X to correct for continuity. We will add .5 to X if the value of X is less than or equal to n/2, and we will subtract .5 from X if the value of X is greater than n/2. The values of the mean and the standard deviation are calculated as

21 Example 14-2 A developer is interested in building a shopping mall adjacent to a residential area. Before granting or denying permission to build such a mall, the town council took a random sample of 75 adults from adjacent areas and asked them whether they favor or oppose construction of this mall. Of these 75 adults, 40 opposed construction of the mall, 30 favored it, and 5 had no opinion. Can you conclude that the number of adults in this area who oppose construction of the mall is higher than the number who favor it? Use α = .01.

22 Solution 14-2 H0: p = .50 and q = .50 H1: p > .50 or p > q
The two proportions are equal H1: p > or p > q The proportion of adults who oppose the mall is greater than the proportion who favor it

23 Solution 14-2 We will use the sign test to perform this test n = 70
The sample is large (n > 25) We can use the normal approximation to perform the test

24 Solution 14-2 H1 states that p > .50 α = .01
The test is right-tailed α = .01 The z value for = is approximately 2.33

25 Figure 14.2 α = .01 z Nonrejection region Rejection region 2.33

26 Solution 14-2

27 Solution 14-2 The observed value of z = 1.08
It is less than the critical value of z=2.33 It falls in the nonrejection region Hence, we do not reject H0

28 Test About The Median of a Single Population
The Small-Sample Case If n ≤ 25, we use the binomial probability distribution to test a hypothesis about the median of a population

29 Example 14-3 A real estate agent claims that the median price of homes in a small town is $137,000. A sample of 10 houses selected by a statistician produced the following data on their prices.

30 Example 14-3 Home Price ($) 1 2 3 4 5 6 7 8 9 10 147,500 123,600 139,000 168,200 129,450 132,400 156,400 188,210 198,425 215,300 Using the 5% significance level, can you conclude that the median price of homes in this town is different from $137,000?

31 Table 14.2 Home 1 2 3 4 5 6 7 8 9 10 Sign + -

32 Solution 14-3 H0: Median price = $137,000 H1: Median price ≠ $137,000
Real estate agent’s claim is true H1: Median price ≠ $137,000 Real estate agent’s claim is false

33 Solution 14-3 For a test of the median of a population, we employ the sign test procedure The sample is small; n = 10 We use the binomial probability distribution to conduct the test

34 Solution 14-3 n = 10 α = .05 The test is two-tailed
The (lower and upper) critical values of X are 1 and 9

35 Figure 14.3 0 or 1 2 to 8 9 or 10 Rejection region Nonrejection region
X

36 Observed Value of X When using the sign test to perform a test about a median, we can use either the number of positive signs or the number of negative signs as the observed value of X if the test is two-tailed. However, the observed value of X is equal to the larger of these two numbers (the number of positive and negative signs) if the test is right-tailed, and equal to the smaller of these two numbers if the test is left-tailed.

37 Solution 14-3 The observed value of X = 7 Hence, we do not reject H0
It falls in the nonrejection region Hence, we do not reject H0

38 Test About The Median of a Single Population cont.
The Large-Sample Case For a test of the median of a single population, we can use the normal approximation to the binomial probability distribution when n > 25 The observed value of z, in this case, is calculated as in a test of hypothesis about the preference for categorical data

39 Example 14-4 A long-distance phone company believes that the median phone bill (for long-distance calls) is at least $70 for all the families in New Haven, Connecticut. A random sample of 90 families selected from New Haven showed that the phone bills of 51 of them were less than $70 and those of 38 of them were more than $70, and one family had a phone bill of exactly $70. Using the 1% significance level, can you conclude that the company’s claim is true?

40 Solution 14-4 H0: Median ≥ $70 H1: Median < $70
Company’s claim is true H1: Median < $70 Company’s claim is false

41 Solution 14-4 n > 25 Hence, we can use the normal distribution as an approximation to the binomial probability distribution

42 Solution 14-4 α = .01 The test is left-tailed z = -2.33

43 Figure 14.4 α = .01 Rejection region z Nonrejection region -2.33

44 Solution 14-4

45 Solution 14-4 z = -1.27 Hence, we do not reject H0
It is greater than the critical value of z Hence, we do not reject H0

46 Tests About the Median Difference Between Paired Data
The Small-Sample Case If n ≤ 25, we use the binomial probability distribution to perform a test about the difference between the medians of paired data.

47 Example 14-5 A researcher wanted to find the effects of a special diet on systolic blood pressure in adults. She selected a sample of 12 adults and put them on this dietary plan for three months. The following table gives the systolic blood pressure of each adult before and after the completion of the plan.

48 Example 14-5 Before After 210 185 215 198 187 225 234 217 212 191 226 238 196 192 204 193 181 233 208 211 190 186 218 236 Using the 2.5% significance level, can we conclude that the dietary plan reduces the median systolic blood pressure of adults?

49 Sign of Difference (before – after)
Solution 14-5 Table 14.3 Before After Sign of Difference (before – after) 210 185 215 198 187 225 234 217 212 191 226 238 196 192 204 193 181 233 208 211 190 186 218 236 + -

50 Solution 14-5 H0: M = 0 H1: M > 0
The dietary plan does not reduce the median blood pressure H1: M > 0 The dietary plan reduces the median blood pressure

51 Solution 14-5 n = 12 < 25 The shape of the distribution of the population of paired differences is not known Hence, we use the sign test with the binomial probability distribution

52 Solution 14-5 α = .025 The test is right-tailed
The (upper) critical value of X is 10

53 Figure 14.5 0 to 9 10 to 12 Nonrejection region Rejection region X

54 Solution 14-5 The observed value of X = 10 Hence, we reject H0
It falls in the rejection region Hence, we reject H0

55 Tests About the Median Difference Between Paired Data cont.
The Large-Sample Case If n > 25, we can use the normal distribution as an approximation of the binomial distribution.

56 Example 14-6 Many students suffer from math anxiety. A statistics professor offered a two-hour lecture on math anxiety and ways to overcome it. A total of 42 students attended this lecture. The students were given similar statistics tests before and after the lecture. Thirty-three of the 42 students score higher on the test after the lecture, 7 scored lower after the lecture, and 2 scored the same on both tests.

57 Example 14-6 Using the 1% significance level, can you conclude that the median score of students increases as a result of attending this lecture? Assume that these 42 students constitute a random sample of all students who suffer from math anxiety.

58 Solution 14-6 H0: M = 0 H1: M < 0
The lecture does not increase the median score H1: M < 0 The lecture increases the median score

59 Solution 14-6 n = 40 > 25 We can use the normal distribution to test this hypothesis α = .01 The test is left-tailed The critical value of z = -2.33

60 Figure 14.6 α = .01 Rejection region z Nonrejection region -2.33

61 Solution 14-6

62 Solution 14-6 The observed value of z = -3.95
It is less than the critical value of z It falls in the rejection region Consequently, we reject H0

63 THE WILCOXON SIGNED-RANK TEST FOR TWO DEPENDENT SAMPLES
The Wilcoxon signed-rank test for two dependent (paired) samples is used to test whether or not the two populations from which these samples are drawn are identical The Small-Sample Case If the sample size is 15 or smaller, we find the critical value of the test statistic, T We use the normal distribution to perform the test

64 Example 14-7 A private agency claims that the crash course it offers significantly increases the writing speed of secretaries. The following table gives the writing speeds of eight secretaries before and after they attended this course.

65 Example 14-7 Using the 2.5% significance level, can you conclude that attending this course increases the writing speed of secretaries? Use the Wilcoxon signed-rank test. Before 84 75 88 91 65 71 90 After 97 72 93 110 78 69 115

66 Solution 14-7 H0: MA = MB H1: MA > MB
The crash course does not increase the writing speed of secretaries H1: MA > MB The crash course does increase the writing speed of secretaries

67 Solution 14-7 The distribution of paired differences is unknown
We use the Wilcoxon signed-rank test procedure for the small sample case

68 Solution 14-7 α = .025 n = 7 The test is right-tailed
The critical value of T = 2

69 Figure 14.7 0, 1, or 2 3 or higher Rejection region
Nonrejection region T

70 Decision Rule For the Wilcoxon signed-rank test for small samples (n ≤ 15), the critical value of T is obtained from Table XII. Note that in the Wilcoxon signed-rank test, the decision rule is to reject the null hypothesis if the observed value of T is less than or equal to the critical value of T. This rule is true for a two-tailed, a right-tailed, or a left-tailed test.

71 Differences (Before – After)
Table 14.4 Before After Differences (Before – After) Absolute Differences Ranks of Differences Signed Ranks 84 75 88 91 65 71 90 97 72 93 110 78 69 115 -13 +3 -5 -19 +2 -25 13 3 5 19 2 25 4.5 6 1 7 -4.5 -3 -6 +1 -7

72 Observed Value of the Test Statistic T
If the test is two-tailed with the alternative hypothesis that the two distributions are not the same, then the observed value of T is given by the smaller of the two sums, the sum of the positive ranks and the sum of the absolute values of the negative ranks. We will reject H0 if the observed value of T is less than or equal to the critical value of T.

73 Observed Value of the Test Statistic T cont.
If the test is right-tailed with the alternative hypothesis that the distribution of after values is to the right of the distribution of before values, then the observed value of T is given by the sum of the values of the positive ranks. We will reject H0 if the observed value of T is less than or equal to the critical value of T.

74 Observed Value of the Test Statistic T cont.
If the test is left-tailed with the alternative hypothesis that the distribution of after values is to the left of the distribution of before values, then the observed value of T is given by the sum of the absolute values of the negative ranks. We will reject H0 if the observed value of T is less than or equal to the critical value of T.

75 Observed Value of the Test Statistic T cont.
Remember, for the above to be true, the paired difference is defined as the before value minus the after value. In other words, the differences are obtained by subtracting the after values from the before values.

76 Solution 14-7 Observed value of T = sum of the positive ranks = 3
Observed value of T ≤ Critical value of T Hence, we do not reject H0

77 THE WILCOXON SIGNED-RANK TEST FOR TWO DEPENDENT SAMPLES cont.
The Large-Sample Case If n > 15, we can use the normal distribution to make a test of hypothesis about the paired differences.

78 Example 14-8 The manufacturer of a gasoline additive claims that the use of its additive increases gasoline mileage. A random sample of 25 cars was selected, and these cars were driven for one week without the gasoline additive and then for one week with the additive. Then, the miles per gallon (mpg) were estimated for these cars without and with the additive. Next, the paired differences were calculated for these 25 cars, where a paired difference is defined as Paired difference = mpg without additive – mpg with additive

79 Example 14-8 The differences were positive for 4 cars, negative for 19 cars, and zero for 2 cars. First, the absolute values of the paired differences were ranked, and then these ranks were assigned the signs of the corresponding paired differences. The sum of the ranks of the positive paired differences was 58, and the sum of the absolute values of the ranks of the negative paired differences was 218. Can you conclude that the use of the additive increases gasoline mileage? Use the 1% significance level.

80 Solution 14-8 H0: MA = MB H1: MA > MB

81 Solution 14-8 The sample size is greater than 15
We use the Wilcoxon signed-rank test procedure with the normal distribution approximation

82 Solution 14-8 α = .01 The test is right-tailed
The critical value of z = 2.33

83 Figure 14.8 α = .01 .4900 z Nonrejection region Rejection region 2.33

84 Observed Value of z In a Wilcoxon signed-rank test for two dependent samples, when the sample size is large (n > 15), the observed value of z for the test statistic T is calculated as where

85 Observed Value of z cont.
The value of T that is used to calculate the value of z is determined based on the alternative hypothesis, as explained next.

86 Solution 14-8

87 Solution 14-8 The observed value of z = 2.43
It falls in the rejection region Hence, we reject the null hypothesis

88 THE WILCOXON RANK SUM TEST FOR TWO INDEPENDENT SAMPLES
The Small-Sample Case If the sizes of both samples are 10 or less, we use the Wilcoxon rank sum test for small samples.

89 Example 14-9 A researcher wants to determine whether the distributions of daily crimes in two cities are identical. The following data give the numbers of violent crimes on eight randomly selected days for City A and on nine days for City B. City A 12 21 16 8 26 13 19 23 City B 18 25 14 28 20 31

90 Example 14-9 Using the 5% significance level, can you conclude that the distributions of daily crimes in the two cities are different?

91 Solution 14-9 H0: The population distributions of daily crimes in the two cities are identical H1: The population distributions of daily crimes in the two cities are different

92 Solution 14-9 Let the distribution of daily crimes in City A be called population 1 Let the distribution of daily crimes in City B be called population 2 The respective samples are called sample 1 and sample 2 n1 < 10 n2 < 10 We use the Wilcoxon rank sum test for small samples

93 Solution 14-9 The test statistic in Wilcoxon’s rank sum test is T
The test is two-tailed α = .05 n1 =8 and n2 = 9 TL = 51 and TU = 93

94 Figure 14.9 51 or lower 52 to 92 93 or higher Rejection region
Nonrejection region

95 Table 14.5 City A City B Crimes Rank 12 21 16 8 26 13 19 23 2 11 5.5 1
15 3 8.5 12.5 18 25 14 28 20 31 7 4 10 17 Sum = 58.5 Sum = 94.5

96 Solution 14-9 The observed value of T = 58.5
It is between TL = 51 and TU = 93 Hence, we do not reject H0

97 Wilcoxon Rank Sum Test for Small Independent Samples
A two-tailed test: The null hypothesis is that the two population distributions are identical, and the alternative hypothesis is that the two population distributions are different. The critical values of T, TL and TU, for this test are obtained from Table XIII for the given significance level and sample sizes. The observed value of T is given by the sum of the ranks for the smaller sample. The null hypothesis is rejected if T ≤ TL or T ≥ TU. Otherwise, the null hypothesis is not reject. Note, that if the two sample sizes are equal, the observed value of T is given by the sum of the ranks for either sample.

98 Wilcoxon Rank Sum Test for Small Independent Samples cont.
A right-tailed test: The null hypothesis is that the two population distributions are identical, and the alternative hypothesis is that the distribution of population 1 (the population that corresponds to the smaller sample) lies to the right of the distribution of population 2. The critical value of T is given by TU in Table XIII for the given α for a one-tailed test and the given sample sizes. The observed value of T is given by the sum of the ranks for the smaller sample. The null hypothesis is rejected if T ≥ TU. Otherwise, the null hypothesis is not rejected. Note that if the two sample sizes are equal, the observed value of T is given by the sum of the ranks for sample 1.

99 Wilcoxon Rank Sum Test for Small Independent Samples cont.
A left-tailed test: The null hypothesis is that the two population distributions are identical, and the alternative hypothesis is that the distribution of population 1 (the population that corresponds to the smaller sample) lies to the left of the distribution of population 2. The critical value of T in this case is given by TL in Table XIII for the given α for a one-tailed test and the given sample sizes. The observed value of T is given by the sum of the ranks for the smaller sample. The null hypothesis is rejected if T ≤ TL. Otherwise, the null hypothesis is not rejected. Note that if the two sample sizes are equal, the observed value of T is given by the sum of the ranks for sample 1.

100 THE WILCOXON RANK SUM TEST FOR TWO INDEPENDENT SAMPLES cont.
The Large-Sample Case If either n1 or n2 or both n1 and n2 are greater than 10, we use the normal distribution as an approximation to the Wilcoxon rank sum test for two independent samples.

101 Observed Value of z In the case of a large sample, the observed value of z is calculated as

102 Observed Value of z cont.
Here, the sampling distribution of the test statistic T is approximately normal with mean μT and standard deviation σT. The values of μT and σT are calculated as

103 Example 14-10 A researcher wanted to find out whether job-related stress is lower for college and university professors than for physicians. She took random samples of 14 professors and 11 physicians and tested them for job-related stress. The following data give the stress levels for professors and physicians on a scale of 1 to 20, where 1 is the lowest level of stress and 20 is the highest.

104 Example 14-10 Using the 1% significance level, can you conclude that the job-related stress level for professors is lower than that for physicians? Professors 5 9 4 12 6 15 2 8 10 11 3 Physicians 18 13 14 16

105 Solution 14-10 H0: The two population distributions are identical
H1: The distribution of population 1 is to the right of the distribution of population 2

106 Solution 14-10 n1 > 10 and n2 > 10
We use the normal distribution to make this test The test is right-tailed α = .01 The critical value of z = 2.33

107 Figure 14.10 α = .01 z Nonrejection region Rejection region 2.33

108 Table 14.6 Physicians Professors Stress Level Rank 10 18 12 5 13 14 9
16 11 14.5 24.5 18.5 5.5 20 21 12.5 8 23 16.5 4 15 2 3 3.5 22 1 10.5 Sum = 188.5 Sum = 136.5 Table 14.6

109 Solution 14-10

110 Solution 14-10 The observed value of z = 2.49 Hence, we reject H0
It is greater than the critical value of z It falls in the rejection region Hence, we reject H0

111 Wilcoxon Rank Sum Test for Large Independent Samples
When n1 > 10 or n2 > 10 (or both samples are greater than 10), the distribution of T (the sum of the ranks of the smaller of the two samples) is approximately normal with mean and standard deviation as follows:

112 Wilcoxon Rank Sum Test for Large Independent Samples cont.
For two-tailed, right-tailed, and left-tailed tests, first calculate T, μT, σT, and the value of the test statistic, z = (T – μT)/σT. If n1 = n2, T can be calculated from either sample 1 or sample 2.

113 Wilcoxon Rank Sum Test for Large Independent Samples cont.
A two-tailed test: The null hypothesis is that the two population distributions are identical, and the alternative hypothesis is that the two population distributions are different . At significance level α, the critical values of z are obtained from Table VII in Appendix C. The null hypothesis is rejected if the observed value of z falls in the rejection region.

114 Wilcoxon Rank Sum Test for Large Independent Samples cont.
A right-tailed test: The null hypothesis is that the two population distributions are identical, and the alternative hypothesis is that the distribution of population 1 (the population with the smaller sample size) lies to the right of the distribution of population 2. At significance level α, the critical value of z is obtained from Table VII in Appendix C. The null hypothesis is rejected if the observed value of z falls in the rejection region.

115 Wilcoxon Rank Sum Test for Large Independent Samples cont.
A left-tailed test: The null hypothesis is that the two population distributions are identical, and the alternative hypothesis is that the distribution of population 1 (the population with the smaller sample size) lies to the left of the distribution of population 2. At significance level α, the critical value of z is found from Table VII of Appendix C. The null hypothesis is rejected if the observed value of z falls in the rejection region.

116 THE KRUSKAL-WALLIS TEST
To perform the Kruskal-Wallis test, we use the chi-square distribution. The test statistic in this test is denoted by H, which follows (approximately) the chi-square distribution. The critical value of H is obtained from Table IX in Appendix C for the given level of significance and df = k – 1, where k is the number of populations under consideration. Note that the Kruskal-Wallis test is always right-tailed.

117 THE KRUSKAL-WALLIS TEST cont.
Observed Value of the Test Statistic H The observed value of the test statistic H is calculated using the following formula:

118 Observed Value of the Test Statistic H cont.
where R1 = sum of the ranks for sample 1 R2 = sum of the ranks for sample 2 Rk = sum of the ranks for sample k n1 = sample size for sample 1 n2 = sample size for sample 2 nk = sample size for sample k n = n1 + n nk k = number of samples

119 Example 14-11 A researcher wanted to find out whether the population distributions of salaries of computer programmers are identical in three cities, Boston, San Francisco, and Atlanta. Three different samples – one from each city - produced the following data on the annual salaries (in thousands of dollars) of computer programmers.

120 Example 14-11 Boston San Francisco Atlanta 43 39 62 73 51 46 54 33 58
38 55 34 57 68 60 44 28 49

121 Example 14-11 Using the 2.5% significance level, can you conclude that the population distributions of salaries for computer programmers in these three cities are all identical?

122 Solution 14-11 H0: The population distributions of salaries of computer programmers in the three cities are all identical H1: The population distributions of salaries of computer programmers in the three cities are not all identical

123 Solution 14-11 The shapes of the population distributions are unknown
Three populations Hence, we apply the Kruskal-Wallis procedure to perform this test We use the chi-square distribution

124 Solution 14-11 α = .025 df = k – 1 = 3 – 1 = 2 The critical value of χ2 = 7.378

125 Figure 14.11 df = 2 α = .025 χ2 7.378 Nonrejection region

126 Table 14.7 Boston San Francisco Atlanta Salary Rank 43 39 62 73 51 46
7.5 5.5 19 21 12 10 54 33 58 38 55 34 13 2 17 4 14 3 57 68 60 44 28 49 15.5 20 18 9 1 11 n1 = 6 R1 = 75 n2 =7 R2 = 60.5 n3 = 8 R3 = 95.5

127 Solution 14-11

128 Solution 14-11 The observed value of H = 1.543
It is less than the critical value of H It falls in the nonrejection region Hence, we do not reject the null hypothesis

129 THE SPEARMAN RHO RANK CORRELATION COEFFICIENT TEST
The Spearman rho rank correlation coefficient is denoted by rs for sample data and by ρs for population data. This correlation coefficient is simply the linear correlation coefficient between the ranks of the data. To calculate the value of rs, we rank the data for each variable, x and y, separately and denote those ranks by u and v, respectively. Then we take the difference between each pair of ranks and denote it by d. Thus, Difference between each pair of ranks = d = u – v

130 Spearman Rho Rank Correlation Coefficient cont.
Next, we square each difference d and add these squared differences to find Σd2. Finally, we calculate the value of rs using the formula: In a test of hypothesis about the Spearman rho rank correlation coefficient ρs, the test statistic is rs and its observed value is calculated by using the above formula.

131 Example 14-12 Suppose we want to investigate the relationship between the per capita income (in thousands of dollars) and the infant mortality rate (in percent) for different states. The following table gives data on these two variables for a random sample of eight states.

132 Example 14-12 Based on these data, can you conclude that there is no significance (linear) correlation between the per capita incomes and the infant mortality rates for all states? Use α = .05. Per capita income (x) 29.85 19.0 19.18 31.78 25.22 16.68 23.98 26.33 Infant mortality (y) 8.3 10.1 10.3 7.1 9.9 11.5 8.7 9.8

133 Solution 14-12 H0: ρs = 0 There is no correlation between per capita incomes and infant mortality rates in all states H1: ρs ≠ 0 There is a correlation between per capita incomes and infant mortality rates in all states

134 Solution 14-12 We use the Spearman rho rank correlation coefficient test procedure to make this test n = 8 α = .05 The test is two-tailed The critical values of rs are ±.738, or and -.738

135 Figure 14.12 Rejection region Nonrejection region rs -.738 +.738

136 Critical Value of rs The critical value of rs is obtained from Table XIV in Appendix C for the given sample size and significance level. If the test is two-tailed, we use two critical values, one negative and one positive. However, we use only the negative value of rs if the test is left-tailed, and only the positive value of rs if the test is right-tailed.

137 Table 14.8 u 7 2 3 8 5 1 4 6 v d -4 -7 d2 25 16 49 Σd2 = 160

138 Solution 14-12 rs = -.905 Hence, we reject H0 It is less than -.738
It falls in the rejection region Hence, we reject H0

139 Decision Rule for the Spearman Rho Rank Correlation Coefficient
The null hypothesis is always H0: ρs = 0. The observed value of the test statistic is always the value of rs computed from the sample data. Let α denote the significance level, and –c and +c be the critical values for the Spearman rho rank correlation coefficient test obtained from Table XIV.

140 Decision Rule for the Spearman Rho Rank Correlation Coefficient cont.
For a two-tailed test, the alternative hypothesis is H1: ρs ≠ 0. If ±c are the critical values corresponding to sample size n and two-tailed α, we reject H0 if either rs ≤ -c or rs ≥ +c; that is, reject H0 if rs is “too small” or “too large.”

141 Decision Rule for the Spearman Rho Rank Correlation Coefficient cont.
For a right-tailed test, the alternative hypothesis is H1: ρs > 0. If +c is the critical value corresponding to sample size n and one-sided α, we reject H0 if rs ≥+c; that is, reject H0 if rs is “too large.”

142 Decision Rule for the Spearman Rho Rank Correlation Coefficient cont.
For a left-tailed test, the alternative hypothesis is H1: ρs < 0. If –c is the critical value corresponding to sample size n and one-sided α, we reject H0 if rs ≤ –c; that is, reject H0 if rs is “too small.”

143 THE RUNS TEST FOR RANDOMNESS
The Small-Sample Case Definition A run is a sequence of one or more consecutive occurrences of the same outcome in a sequence of occurrences in which there are only two outcomes. The number of runs in a sequence is denoted by R. The value of R obtained for a sequence of outcomes for a sample gives the observed value of the test statistic for the runs test for randomness.

144 Example 14-13 A college admissions office is interested in knowing whether applications for admission arrive randomly with respect to gender. The gender of 25 consecutively arriving applications were found to arrive in the following order (here M denotes a male applicant and F a female applicant).

145 M F M M F F F M F M M M F F F F M M M F F M F M M
Example 14-13 M F M M F F F M F M M M F F F F M M M F F M F M M Can you conclude that the applications for admission arrive randomly with respect to gender? Use α = .05.

146 Solution 14-13 H0: Applications arrive in a random order with respect to gender H1 Applications do not arrive in a random order with respect to gender

147 Solution 14-13 Let n1 and n2 be the number of male and female applicants, respectively n1 = 13 and n2 = 12 Both n1 and n2 are less than 15 We use the runs test to check for randomness α = .05 The critical values are c1 = 8 and c2 = 19

148 Figure 14.13 R 8 or lower 9 to 18 10 or higher Rejection region
Nonrejection region R

149 Solution 14-13 The observed value of R = 13 Hence, we do not reject H0
It is between 9 and 18 Hence, we do not reject H0

150 THE RUNS TEST FOR RANDOMNESS cont.
The Large-Sample Case The observed value of z For large values of n1 and n2, the distribution of R (the number of runs in the sample) is approximately normal with its mean and standard deviation given as

151 The Observed Value of z The observed value of z for R is calculated using the formula

152 Example 14-14 Refer to Example Suppose that the admissions officer examines 50 consecutive applications and observes that n1 = 22, n2 = 28, and R = 20, where n1 is the number of male applicants, n2 the number of female applicants, and R the number of runs. Can we conclude that the applications for admission arrive randomly with respect to gender? Use α = .01.

153 Solution 14-14 H0: Applications arrive in a random order with respect to gender H1: Applications do not arrive in a random order with respect to gender

154 Solution 14-13 n1 = 22 and n2 = 28 Both n1 and n2 are greater than 15
We use the normal distribution to make the runs test α = .01 The test is two-tailed The critical values of z are and 2.58

155 Figure 14.14 z α / 2 = .005 α / 2 = .005 .4950 .4950 Rejection region
Rejection region Rejection region z Nonrejection region -2.58 2.58

156 Solution 14-14

157 Solution 14-14 z = -1.64 Hence, we do not reject H0
It is between and 2.58 Hence, we do not reject H0


Download ppt "NONPARAMETRIC METHODS"

Similar presentations


Ads by Google