Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Lesson 15 - 1 Nonparametric Statistics Overview

Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric methods use techniques to test claims that are distribution free

Vocabulary Parametric statistical procedures – inferential procedures that rely on testing claims regarding parameters such as the population mean μ, the population standard deviation, σ, or the population proportion, p. Many times certain requirements had to be met before we could use those procedures. Nonparametric statistical procedures – inferential procedures that are not based on parameters, which require fewer requirements be satisfied to perform the tests. They do not require that the population follow a specific type of distribution. Efficiency – compares sample size for a nonparametric test to the equivalent parametric test. Example: If a nonparametric statistical test has an efficiency of 0.85, a sample size of 100 would be required in the nonparametric test to achieve the same results a sample of 85 would produce in the equivalent parametric test.

Nonparametric Advantages Most of the tests have very few requirements, so it is unlikely that these tests will be used improperly. For some nonparametric procedures, the computations are fairly easy. The procedures can be used for count data or rank data, so nonparametric methods can be used on data such as rankings of a movie as excellent, good, fair, or poor.

Nonparametric Disadvantages The results of the test are typically less powerful. Recall that the power of a test refers to the probability of making a Type II error. A Type II error occurs when a researcher does not reject the null hypothesis when the alternative hypothesis is true. Nonparametric procedures are less efficient than parametric procedures. This means that a larger sample size is required when conducting a nonparametric procedure to have the same probability of a Type I error as the equivalent parametric procedure.

Power vs Efficiency ●The power of a test refers to the probability of a Type II error ●Thus when both nonparametric and parametric procedures apply, for the nonparametric method  The researcher does not reject H 0 when H 1 is true, with higher probability  The researcher cannot distinguish between H 0 and H 1 as effectively ●The efficiency of a test refers to the sample size needed to achieve a certain Type I error ●Thus when both nonparametric and parametric procedures apply, for the nonparametric method  The researcher requires larger sample sizes when using nonparametric methods  The researcher incurs higher costs associated with the larger number of subjects

NonparametricTest Parametric Test Efficiency of Nonparametric Test Sign testSingle-sample z-test or t-test 0.955 (for small samples from a normal population) 0.75 (for samples of size 13 or larger if data are normal) Mann–Whitney test Inference about the difference of two means—independent samples 0.955 (if data are normal) Wilcoxon matched- pairs test Inference about the difference of two means—dependent samples 0.955 (if the differences are normal) Kruskal–Wallis testOne-way ANOVA 0.955 (if the data are normal) 0.864 (if the distributions are identical except for medians) Spearman rank- correlation Linear correlation 0.912 (if the data are bivariate coefficient normal) Efficiency

Summary and Homework Summary –Nonparametric tests require few assumptions, and thus are applicable in situations where parametric tests are not –In particular, nonparametric tests can be used on rankings data (which cannot be analyzed by parametric tests) –When both are applicable, nonparametric tests are less efficient than parametric tests Homework –problems 1-5 from CD

Lesson 15 - 2 Runs Test for Randomness

Objectives Perform a runs test for randomness Runs tests are used to test whether it is reasonable to conclude data occur randomly, not whether the data are collected randomly.

Vocabulary Runs test for randomness – used to test claims that data have been obtained or occur randomly Run – sequence of similar events, items, or symbols that is followed by an event, item, or symbol that is mutually exclusive from the first event, item, or symbol Length – number of events, items, or symbols in a run

Runs Test for Randomness ●Assume that we have a set of data, and we wish to know whether it is random, or not ●Some examples  A researcher takes a systematic sample, choosing every 10 th person who passes by … and wants to check whether the gender of these people is random  A professor makes up a true/false test … and wants to check that the sequence of answers is random

Runs, Hits and Errors A run is a sequence of similar events –In flipping coins, the number of “heads” in a row –In a series of patients, the number of “female patients” in a row –In a series of experiments, the number of “measured value was more than 17.1” in a row It is unlikely that the number of runs is too small or too large This forms the basis of the runs test

Runs Test (small case) Small-Sample Case: If n 1 ≤20 and n 2 ≤ 20, the test statistic in the runs test for randomness is r, the number of runs. Critical Values for a Runs Test for Randomness: Use Table IX (critical value at α = 0.05)

Critical Values for a Runs Test for Randomness Use Table IV, the standard normal table. Large-Sample Case: If n 1 > 20 or n 2 > 20 the test statistic in the runs test for randomness is r - μ r z = ------- σ r where 2n 1 n 2 μ r = -------- + 1 n 2n 1 n 2 (2n 1 n 2 – n) σ r = ----------------------- n²(n – 1) Let n represent the sample size of which there are two mutually exclusive types. Let n 1 represent the number of observations of the first type. Let n 2 represent the number of observations of the second type. Let r represent the number of runs. Runs Test (large case)

Hypothesis Tests for Randomness Use Runs Test Step 0 Requirements: 1) sample is a sequence of observations recorded in order of their occurrence 2) observations have two mutually exclusive categories. Step 1 Hypotheses: H 0 : The sequence of data is random. H 1 : The sequence of data is not random. Step 2 Level of Significance: (level of significance determines the critical value) Large-sample case: Determine a level of significance, based on the seriousness of making a Type I error. Small-sample case: we must use the level of significance, α = 0.05. Step 3 Compute Test Statistic: Step 4 Critical Value Comparison: Reject H 0 if Step 5 Conclusion: Reject or Fail to Reject r - μ r z 0 = ------- σ r Small-Sample: r Large Sample: Small-Sample Case: r outside Critical interval Large-Sample Case: z 0 z α/2

Example 1 The following sequence was observed when flipping a coin: H, T, T, H, H, T, H, H, H, T, H, T, T, T, H, H The coin was flipped 16 times with 9 heads and 7 tails. There were 9 runs observed. Values n = 16 n 1 = 9 n 2 = 7 r = 9 Critical values from table IX (9,7) = 4, 14 Since 4 < r = 9 < 14, then we Fail to reject and conclude that we don’t have enough evidence to say that it is not random.

Example 2 The following sequence was observed when flipping a coin: H, T, T, H, H, T, H, H, H, T, H, T, T, T, H, H, H, T T, T, T, H, H, T, T, T, H, T, T, H, H, T, T, H, T, T, T, T The coin was flipped 38 times with 16 heads and 22 tails. There were 18 runs observed. Values n = 38 n 1 = 16 n 2 = 22 r = 18 r - μ r z = ---------  r Since z (-0.515) > -Zα/2 (-2.32) we fail to reject and conclude that we don’t have enough evidence to say its not random. 2n 1 n 2 μ r = -------- + 1 = 19.5263 n 2n 1 n 2 (2n 1 n 2 – n) σ r = ----------------------- = 9.9624 n²(n – 1) z = -0.515

Example 3, Using Confidence Intervals Trey flipped a coin 100 times and got 54 heads and 46 tails, so –n = 100 –n 1 = 54 –n 2 = 46 r - μ r z = ---------  r We transform this into a confidence interval, PE +/- MOE.

Using Confidence Intervals ●The z-value for α = 0.05 level of significance is 1.96 LB: 50.68 – 1.96 4.94 = 41.0 to UB: 50.68 + 1.96 4.94 = 60.4 ●We reject the null hypothesis if there are 41 or fewer runs, or if there are 61 or more ●We do not reject the null hypothesis if there are 42 to 60 runs

Summary and Homework Summary –The runs test is a nonparametric test for the independence of a sequence of observations –The runs test counts the number of runs of consecutive similar observations –The critical values for small samples are given in tables –The critical values for large samples can be approximated by a calculation with the normal distribution Homework –problems 1, 2, 5, 6, 7, 8, 15 from the CD

Lesson 15 - 3 Inferences about Measures of Central Tendency

Objectives Conduct a one-sample sign test

Vocabulary One-sample sign test -- requires data converted to plus and minus signs to test a claim regarding the median –Change all data to + (above H 0 value) or – (below H 0 value) –Any values = to H 0 value change to 0

Sign Test ●Like the runs test, the test statistic used depends on the sample size ●In the small sample case, where the number of observations n is 25 or less, we use the number of +’s and the number of –’s directly ●In the large sample case, where the number of observations n is more than 25, we use a normal approximation

Critical Values for a Runs Test for Randomness Small-Sample Case: Use Table VII to find critical value for a one-sample sign test Large-Sample Case: Use Table IV, standard normal table (one-tailed -z α ; two- tailed -z α/2 ). Small-Sample Case: If n ≤ 25, the test statistic in the signs test is k, defined as below. Large-Sample Case: If n > 25 the test statistic is (k + ½ ) – ½ n z 0 = --------------------- ½ √n Left-TailedTwo-TailedRight-Tailed H 0 : M = M 0 H 1 : M < M 0 H 0 : M = M 0 H 1 : M ≠ M 0 H 0 : M = M 0 H 1 : M > M 0 k = # of + signsk = smaller # of + or - signs k = # of - signs where k = is defined from above and n = number of + and – signs (zeros excluded) Test Statistic Signs Test for Central Tendencies

Hypothesis Tests for Central Tendency Using Signs Test Step 0: Convert all data to +, - or 0 (based on H 0 ) Step 1 Hypotheses: Left-tailedTwo-TailedRight-Tailed H 0 : Median = M 0 H 0 : Median = M 0 H 0 : Median = M 0 H 1 : Median M 0 Step 2 Level of Significance: (level of significance determines critical value) Determine a level of significance, based on the seriousness of making a Type I error Small-sample case: Use Table X. Large-sample case: Use Table IV, standard normal (one-tailed -z α ; two- tailed -z α/2 ). Step 3 Compute Test Statistic: Step 4 Critical Value Comparison: Reject H 0 if Small-Sample Case: k ≤ critical value Large-Sample Case: z 0 < -z α/2 (two tailed) or z 0 < -z α (one-tailed) Step 5 Conclusion: Reject or Fail to Reject Small-Sample: k Large-Sample: (k + ½ ) – ½ n z 0 = --------------------- ½ √n

Small Number Example A recent article in the school newspaper reported that the typical credit-card debt of a student is $500. Professor McCraith claims that the median credit-card debt of students at Joliet Junior College is different from $500. To test this claim, he obtains a random sample of 20 students enrolled at the college and asks them to disclose their credit-card debt. $6000 $0 $200 $0 $400 $1060 $0 $1200 $200 $250 $250 $580 $1000 $0 $0 $200 $400 $800 $700 $1000 $6000 $0 $200 $0 $400 $1060 $0 $1200 $200 $250 $250 $580 $1000 $0 $0 $200 $400 $800 $700 $1000 + = 8 - = 12 k = 8 n = 20 CV = 5 (from table X) Two-Tailed Test: (Med ≠ 300) so k = number of smaller of the signs We reject H 0 if k ≤ critical value (out in the tail). Since 8 > 5, we do not reject H 0.

Large Number Example (k + ½ ) – ½ n z 0 = --------------------- ½ √n 285 310 300 300 320 308 310 293 329 293 326 310 297 301 315 332 305 340 242 310 312 329 320 300 311 286 309 292 287 305 A sports reporter claims that the median weight of offensive linemen in the NFL is greater than 300 pounds. He obtains a random sample of 30 offensive linemen and obtains the data shown in Table 4. Test the reporter’s claim at the α = 0.1 level of significance. + = 19 - = 8 0 = 3 n = 30-3 = 27 k = 8 z 0 = -13/  30 = -1.92 285 310 300 300 320 308 310 293 329 293 326 310 297 301 315 332 305 340 242 310 312 329 320 300 311 286 309 292 287 305 Right-Tailed Test: (Med > 300) so k = number of - signs Since z 0 300

Summary and Homework Summary –The sign test is a nonparametric test for the median, a measure of central tendency –This test counts the number of observations higher and lower than the assumed value of the median –The critical values for small samples are given in tables –The critical values for large samples can be approximated by a calculation with the normal distribution Homework –problems 5, 6, 10, 12 from the CD

Lesson 15 - 4 Inferences about the Differences between Two Medians: Dependent Samples

Objectives Test a claim about the difference between the medians of two dependent samples

Vocabulary Wilcoxon Matched-Pairs Signed-Ranks Test -- a nonparametric procedure that is used to test the equality of two population medians by dependent sampling. Ranks – 1 through n with ties award the sum of the tied ranks divided by the number tied. For example, if 4 th and 5 th observation was tied then they would both receive a 4.5 ranking. Signed-Ranks – ranks are constructed with absolute values and then the sign of the value (positive or negative) is applied to the ranking.

Parametric vs Nonparametric ●For our parametric test for matched-pairs (dependent samples), we  Compared the corresponding observations by subtracting one from the other  Performed a test of whether the mean is 0 ●For our nonparametric case for matched- pairs (dependent samples), we will  Compare the corresponding observations by subtracting one from the other  Perform a test of whether the median is 0

Hypothesis Tests Using Wilcoxon Test Step 0: Compute the differences in the matched-pairs observations. Rank the absolute value of all sample differences from smallest to largest after discarding those differences that equal 0. Handle ties by finding the mean of the ranks for tied values. Assign negative values to the ranks where the differences are negative and positive values to the ranks where the differences are positive. Step 1 Hypotheses: Step 2 Box Plot: Draw a boxplot of the differences to compare the sample data from the two populations. This helps to visualize the difference in the medians. Step 3 Level of Significance: (level of significance determines the critical value) Determine a level of significance, based on the seriousness of making a Type I error Small-sample case: Use Table XI. Large-sample case: Use Table IV. Step 4 Compute Test Statistic: Step 5 Critical Value Comparison: Reject H 0 if test statistic < critical value Step 6 Conclusion: Reject or Fail to Reject Left-TailedTwo-TailedRight-Tailed H 0 : M D = 0 H 1 : M D < 0 H 0 : M D = 0 H 1 : M D ≠ 0 H 0 : M D = 0 H 1 : M D > 0

Test Statistic for the Wilcoxon Matched-Pairs Signed-Ranks Test Small-Sample Case: (n ≤ 30) Large-Sample Case: (n > 30) n (n + 1) T – -------------- 4 z 0 = ----------------------------- n (n + 1)(2n + 1) ------------------------ 24 Left-TailedTwo-TailedRight-Tailed H 0 : M D = 0 H 1 : M D < 0 H 0 : M D = 0 H 1 : M D ≠ 0 H 0 : M D = 0 H 1 : M D > 0 T = T + T = smaller of T + or |T - |T = |T - | where T is the test statistic from the small-sample case. Note: M D is the median of the differences of matched pairs. T + is the sum of the ranks of the positive differences T - is the sum of the ranks of the negative differences

Small-Sample Case (n ≤ 30): Using α as the level of significance, the critical value is obtained from Table XI in Appendix A. Large-Sample Case (n > 30): Using α as the level of significance, the critical value is obtained from Table IV in Appendix A. The critical value is always in the left tail of the standard normal distribution. Left-TailedTwo-TailedRight-Tailed -z α -z α/2 -z α Left-TailedTwo-TailedRight-Tailed -T α -T α/2 -T α Critical Value for Wilcoxon Matched-Pairs Signed-Ranks Test

Example 1 from 15.4 XYD = Y-X|D|Signed Rank 11.526.014.5 +7.5 14.126.212.1 +5 19.324.65.3 +3 35.030.8-4.24.2-2 15.937.521.6 +11 21.536.014.5 +7.5 11.725.914.2 +6 17.116.9-0.20.2 27.350.222.9 +12 13.833.119.3 +10 43.299.956.7 +14 11.226.114.9 +9 34.243.89.6 +4 26.763.837.1 +13 T + = 88T - = |-3| = 3

Example continued H 0 : Med dif = 0 H a : Med dif > 0 (Right tailed test) Test Statistic: |T - | = 3 From Table XI: T critcal = T 0.05 = 25 The test statistic is less than the critical value (3 < 25) so we reject the null hypothesis.

Summary and Homework Summary –The Wilcoxon sign test is a nonparametric test for comparing the median of two dependent samples –This test is a weighted count of the differences in signs between the paired observations –The critical values for small samples are given in tables –The critical values for large samples can be approximated by a calculation with the normal distribution Homework –problems 2, 4, 5, 8, 9, 15 from the CD

Lesson 15 - 5 Inferences about the Differences between Two Medians: Independent Samples

Objectives Test a claim about the difference between the medians of two independent samples

Vocabulary Mann–Whitney Test – nonparametric procedure used to test the equality of two population medians from independent samples Population “X” – is the population of interest from the problem statement

Mann-Whitney Test Similar in structure to Wilcoxon Matched- Pairs Signed-Ranks Test Because samples are independent different test statistics and critical values are used

Hypothesis Tests Using Mann–Whitney Test Step 0 Requirements: 1. the samples are independent random samples and 2. the shape of the distributions are the same (assume to met in our problems) Step 1 Box Plots: Draw a side-by-side boxplot to compare the sample data from the two populations. This helps to visualize the difference in the medians. Step 2 Hypotheses: Step 3 Ranks: Rank all sample observations from smallest to largest. Handle ties by finding the mean of the ranks for tied values. Find the sum of the ranks for the sample from population X. Step 4 Level of Significance: (level of significance determines the critical value) Determine a level of significance, based on the seriousness of making a Type I error Small-sample case: Use Table XII. Large-sample case: Use Table IV. Step 5 Compute Test Statistic: Step 6 Critical Value Comparison: Reject H 0 if test statistic more extreme (further away from 0) than the critical value Step 7 Conclusion: Reject or Fail to Reject Left-TailedTwo-TailedRight-Tailed H 0 : M x = M y H 1 : M x < M y H 0 : M x = M y H 1 : M x ≠ M y H 0 : M x = M y H 1 : M x > M y

Test Statistic for the Mann–Whitney Test The test statistic will depend on the size of the samples from each population. Let n 1 represent the sample size for population X and represent the n 2 sample size for population Y. Small-Sample Case: (n 1 ≤ 20 and n 2 ≤ 20 ) If S is the sum of the ranks corresponding to the sample from population X, then the test statistic, T, is given by n 1 (n 1 - 1) T = S – -------------- 2 Note: The value of S is always obtained by summing the ranks of the sample data that correspond to M x, the median of population X, in the hypothesis. Large-Sample Case: (n 1 > 20 or n 2 > 20 ) From the Central Limit Theorem, the test statistic is given by where T is the test statistic from the small-sample case. n 1 n 2 T – ---------- 2 z 0 = ----------------------------- n 1 n 2 (n 1 + n 2 + 1) ------------------------ 12

Small-Sample Case: (n 1 ≤ 20 and n 2 ≤ 20 ) Using α as the level of significance, the critical value is obtained from Table XII in Appendix A. Large-Sample Case: (n 1 > 20 or n 2 > 20 ) Using α as the level of significance, the critical value is obtained from Table IV in Appendix A. Left-TailedTwo-TailedRight-Tailed -z α z α/2 -z α/2 zαzα Left-TailedTwo-TailedRight-Tailed wαwα w α/2 w 1- α/2 = n 1 n 2 - w α/2 w 1-α = n 1 n 2 - w α Critical Value for Mann–Whitney Test

Example 1 from 15.5 HIOrderRankOrderRank 10640101.532012.5 32080101.532012.5 320128 0 803.532012.5 320160803.532012.5 320640160764018 80640160764018 160128 0 160764018 10640160764018 640160 764018 160320 12.5128021.5 32016032012.5128021.5 101152= S

Example Cont. Hypothesis: H 0 : Med ill = Med healthy H a : Med ill > Med healthy Test Statistic: T = S – ½ n 1 (n 1 +1) = 152 – ½ 11(12) = 86 where S is the sum of the ranks obtained from the sample observations from ill population and n 1 is the number of observations from sample of ill people Critical Value: w 1-α = n 1 n 2 – w α = 121 – 41 = 80 Conclusion: Since T > w 1-α, we reject the null hypothesis and conclude that there is a difference between the median values for ill and healthy people

Summary and Homework Summary –The Mann-Whitney test is a nonparametric test for comparing the median of two independent samples –This test is a sum of ranks of the combined dataset –The critical values for small samples are given in tables –The critical values for large samples can be approximated by a calculation with the normal distribution Homework –problems 4, 5, 9, 10, 12 from the CD

Lesson 15 - 6 Inferences Between Two Variables

Objectives Perform Spearman’s rank-correlation test

Vocabulary Rank-correlation test -- nonparametric procedure used to test claims regarding association between two variables. Spearman’s rank-correlation coefficient -- test statistic, r s 6Σd i ² r s = 1 – -------------- n(n²- 1)

Association ●Parametric test for correlation:  Assumption of bivariate normal is difficult to verify  Used regression instead to test whether the slope is significantly different from 0 ●Nonparametric case for association:  Compare the relationship between two variables without assuming that they are bivariate normal  Perform a nonparametric test of whether the association is 0

Tale of Two Associations Similar to our previous hypothesis tests, we can have a two-tailed, a left-tailed, or a right- tailed alternate hypothesis –A two-tailed alternative hypothesis corresponds to a test of association –A left-tailed alternative hypothesis corresponds to a test of negative association –A right-tailed alternative hypothesis corresponds to a test of positive association

Test Statistic for Spearman’s Rank- Correlation Test The test statistic will depend on the size of the sample, n, and on the sum of the squared differences (d i ²). 6Σd i ² r s = 1 – -------------- n(n²- 1) where d i = the difference in the ranks of the two observations (Y i – X i ) in the ith ordered pair. Spearman’s rank-correlation coefficient, r s, is our test statistic z 0 = r s √n – 1 Small Sample Case: (n ≤ 100) Large Sample Case: (n > 100)

Critical Value for Spearman’s Rank-Correlation Test Left-TailedTwo-TailedRight-Tailed Significanceαα/2α Decision Rule Reject if r s < -CV Reject if r s CV Reject if r s > CV Using α as the level of significance, the critical value(s) is (are) obtained from Table XIII in Appendix A. For a two-tailed test, be sure to divide the level of significance, α, by 2. Small Sample Case: (n ≤ 100) Large Sample Case: (n > 100)

Hypothesis Tests Using Spearman’s Rank-Correlation Test Step 0 Requirements: 1. The data are a random sample of n ordered pairs. 2. Each pair of observations is two measurements taken on the same individual Step 1 Hypotheses: (claim is made regarding relationship between two variables, X and Y) H 0 : see below H 1 : see below Step 2 Ranks: Rank the X-values, and rank the Y-values. Compute the differences between ranks and then square these differences. Compute the sum of the squared differences. Step 3 Level of Significance: (level of significance determines the critical value) Table XIII in Appendix A. (see below) Step 4 Compute Test Statistic: Step 5 Critical Value Comparison: Left-TailedTwo-TailedRight-Tailed Significanceαα/2α H0H0 not associated H1H1 negatively associatedassociatedpositively associated Decision Rule Reject if r s < -CV Reject if r s CV Reject if r s > CV 6Σd i ² r s = 1 – -------------- n(n²- 1)

Expectations If X and Y were positively associated, then  Small ranks of X would tend to correspond to small ranks of Y  Large ranks of X would tend to correspond to large ranks of Y  The differences would tend to be small positive and small negative values  The squared differences would tend to be small numbers ●If X and Y were negatively associated, then  Small ranks of X would tend to correspond to large ranks of Y  Large ranks of X would tend to correspond to small ranks of Y  The differences would tend to be large positive and large negative values  The squared differences would tend to be large numbers

Example 1 from 15.6 SDS-RankD-Rankd = X - Yd² 1002572.511.52.25 1022645411 1032746600 101266451 1052777.58-0.50.25 1002632.53-0.50.25 99258121 1052757.570.50.25 102267AveSum6 Calculations:

Example 1 Continued Hypothesis: H 0 : X and Y are not associated H a : X and Y are associated Test Statistic: 6 Σd i ² 6 (6) 36 r s = 1 - ----------- = 1 – ------------- = 1 - -------- = 0.929 n(n² - 1) 8(64 - 1) 8(63) Critical Value: 0.738 (from table XIII) Conclusion: Since r s > CV, we reject H 0 ; therefore there is a relationship between club-head speed and distance.

Summary and Homework Summary –The Spearman rank-correlation test is a nonparametric test for testing the association of two variables –This test is a comparison of the ranks of the paired data values –The critical values for small samples are given in tables –The critical values for large samples can be approximated by a calculation with the normal distribution Homework –problems 3, 6, 7, 10 from the CD

Lesson 15 - 7 Test to See if Samples Come From Same Population

Objectives Test a claim using the Kruskal–Wallis test

Vocabulary Kruskal–Wallis Test -- nonparametric procedure used to test the claim that k (3 or more) independent samples come from populations with the same distribution.

Test of Means of 3 or more groups ●Parametric test of the means of three or more groups:  Compared the corresponding observations by subtracting one mean from the other  Performed a test of whether the mean is 0 ●Nonparametric case for three or more groups:  Combine all of the samples and rank this combined set of data  Compare the rankings for the different groups

Kruskal-Wallis Test ●Assumptions:  Samples are simple random samples from three or more populations  Data can be ranked ●We would expect that the values of the samples, when combined into one large dataset, would be interspersed with each other ●Thus we expect that the average relative ratings of each sample to be about the same

Test Statistic for Kruskal–Wallis Test A computational formula for the test statistic is where R i is the sum of the ranks of the ith sample R² 1 is the sum of the ranks squared for the first sample R² 2 is the sum of the ranks squared for the second sample, and so on n 1 is the number of observations in the first sample n 2 is the number of observations in the second sample, and so on N is the total number of observations (N = n 1 + n 2 + … + n k ) k is the number of populations being compared. 12 1 n i (N + 1) ² H = -------------- --- R i - ------------ N(N + 1) n i 2 Σ 12 R² 1 R² 2 R² k H = ------------- ----- + ----- + … + ------- - 3(N + 1) N(N + 1) n 1 n 2 n k

Test Statistic (cont) ●Large values of the test statistic H indicate that the R i ’s are different than expected ●If H is too large, then we reject the null hypothesis that the distributions are the same ●This always is a right-tailed test

Critical Value for Kruskal–Wallis Test Small-Sample Case When three populations are being compared and when the sample size from each population is 5 or less, the critical value is obtained from Table XIV in Appendix A. Large-Sample Case When four or more populations are being compared or the sample size from one population is more than 5, the critical value is χ² α with k – 1 degrees of freedom, where k is the number of populations and α is the level of significance.

Hypothesis Tests Using Kruskal–Wallis Test Step 0 Requirements: 1. The samples are independent random samples. 2. The data can be ranked. Step 1 Box Plots: Draw side-by-side boxplots to compare the sample data from the populations. Doing so helps to visualize the differences, if any, between the medians. Step 2 Hypotheses: (claim is made regarding distribution of three or more populations) H 0 : the distributions of the populations are the same H 1 : the distributions of the populations are not the same Step 3 Ranks: Rank all sample observations from smallest to largest. Handle ties by finding the mean of the ranks for tied values. Find the sum of the ranks for each sample. Step 4 Level of Significance: (level of significance determines the critical value) The critical value is found from Table XIV for small samples. The critical value is χ² α with k – 1 degrees of freedom (found in Table VI) for large samples. Step 5 Compute Test Statistic: Step 6 Critical Value Comparison: We reject the null hypothesis if the test statistic is greater than the critical value. 12 R² 1 R² 2 R² k H = ------------- ----- + ----- + … + ------- - 3(N + 1) N(N + 1) n 1 n 2 n k

Kruskal–Wallis Test Hypothesis In this test, the hypotheses are H 0 : The distributions of all of the populations are the same H 1 : The distributions of all of the populations are not the same This is a stronger hypothesis than in ANOVA, where only the means (and not the entire distributions) are compared

Example 1 from 15.7 S20-2940-4960-69 154 (29)61 (31.5)44 (18) 243 (16)41 (14)65 (34.5) 338 (11.5)44 (18)62 (33) 430 (2)47 (21)53 (27.5) 561 (31.5)33 (3)51 (26) 653 (27.5)29 (1)49 (22.5) 735 (7.5)59 (30)49 (22.5) 834 (4.5)35 (7.5)42 (15) 939 (13)34 (4.5)35 (7.5) 1046 (20)74 (36)44 (18) 1150 (24.5) 37 (10) 1235 (7.5)65 (34.5)38 (11.5) Medians (Sums) 41 (194.5) 45.5 (225.5) 46.5 (246)

Example 1 (cont) 12 R² 1 R² 2 R² k H = ------------- ----- + ----- + … + ------- - 3(N + 1) N(N + 1) n 1 n 2 n k 12 194.5² 225.5² 246² H = ------------- ---------- + --------- + -------- - 3(36 + 1) = 1.009 36(36 + 1) 12 12 12 Critical Value: ( Large-Sample Case) χ² α with 2 (3 – 1) degrees of freedom, where 3 is the number of populations and 0.05 is the level of significance CV= 5.991 Conclusion : Since H < CV, therefore we FTR H 0 (distributions are the same)

Summary and Homework Summary –The Kruskal-Wallis test is a nonparametric test for comparing the distributions of three or more populations –This test is a comparison of the rank sums of the populations –Critical values for small samples are given in tables –The critical values for large samples can be approximated by a calculation with the chi-square distribution Homework –problems 3, 6, 7, 10 from the CD

Lesson 15 - R Chapter 15 Review

Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review exercises Use the technology to compute statistical data in the chapter

Nonparametric vs Parametric The following nonparametric methods analyze similar problems as parametric methods The ProblemParametricNonparametric Independence of observations No corresponding procedure Runs test Test of central tendency z-test for the mean t-test for the mean Sign test for the median Test of dependent samples t-test of the mean of the differences Wilcoxon signed rank test Test of independent samples t-tests of the difference in means Mann-Whitney test CorrelationRegressionSpearman’s rank correlation Centers of multiple groups ANOVA (means)Kruskal-Wallis test of distributions

Large vs Small Case Sample Sizes The following table breaks down small versus large sample sizes for each nonparametric test NonparametricSmall SampleLarge Sample Runs Testn 1 ≤ 20 and n 2 ≤ 20n 1 > 20 or n 2 > 20 Sign Testn ≤ 25n > 25 Wilcoxon Testn ≤ 30n > 30 Mann-Whitney testn 1 ≤ 20 and n 2 ≤ 20n 1 > 20 or n 2 > 20 Spearman’s Testn ≤ 100 n > 100 Kruskal-WallisTestk = 3 and n i ≤ 5k > 3 or n i > 5

Problem 1 A difference between parametric statistical methods and nonparametric statistical methods is that 1)Nonparametric statistical methods are better 2)Parametric methods are always easier to compute 3)Nonparametric methods use few, if any, distribution assumptions 4)All of the above

Problem 2 Another name for nonparametric methods is 1)Distribution-free procedures 2)Normal distribution procedures 3)Mean and variance procedures 4)Descriptive methods

Problem 3 The runs test is a test of 1)Whether a set of data has a certain mean 2)Whether a set of data is random 3)Whether a set of data has a certain slope 4)Whether two sets of data have different slopes

Problem 4 In the runs test, if there are very few runs, then we are likely to 1)Reject or do not reject the null hypothesis depending on whether this is a two-tailed, left- tailed, or right-tailed test 2)Reject the null hypothesis that the data are random 3)Not reject the null hypothesis that the data are random 4)All of the above

Problem 5 The sign test is a test of 1)Whether a set of data has a certain mean 2)Whether a set of data is random 3)Whether a set of data has a certain slope 4)Whether a set of data has a certain median

Problem 6 The sign test counts 1)The number of values more than and less than the hypothesized mean 2)The sum of all the positive values of the variable 3)The z-score of the sample median 4)The number of values more than and less than the hypothesized median

Problem 7 The Wilcoxon signed-rank test is a test of 1)Whether two sets of data with matched observations have the same median 2)Whether one set of data has more positive or negative values 3)Whether one set of data has a certain slope 4)Whether two sets of data have the same slopes

Problem 8 To perform the Wilcoxon matched-pairs signed-ranks test, we 1)Compare the number of positive values for the two sets of data 2)Rank the differences of the matched observations 3)Switch the signs of all of the data values 4)All of the above

Problem 9 The Mann-Whitney test is a test of 1)Whether a set of data has a certain mean 2)Whether two sets of data are both random 3)Whether two independent samples have the same medians 4)Whether two independent samples have the same standard deviations

Problem 10 The Mann-Whitney test uses 1)The rankings of each sample’s observations when the two samples are combined 2)A sum of the ranks of one sample’s observations 3)Either a table or an approximation using the normal distribution to find the critical values 4)All of the above

Problem 11 Spearman's rank correlation test is a test of 1)Whether a set of ordered pairs of data has an association 2)Whether a set of ordered pairs of data is listed in a random order 3)Whether two sets of data have the same median 4)Whether two sets of data have the same slope

Problem 12 A group of students enter a contest that has two parts. If the X variable is the student’s rank on the first part and the Y variable is the student’s rank on the second part, then Spearman’s rank correlation test 1)Can be applied because the data is ordered 2)Can be applied because the data is bivariate normal 3)Cannot be applied because ranks cannot be added 4)Cannot be applied because the two variables are dependent

Problem 13 The Kruskal-Wallis test is a test of 1)Whether three or more populations have the same means 2)Whether two variables have a positive association 3)Whether two variables are independent 4)Whether three or more populations have the same distributions

Problem 14 Kruskal-Wallis test differs from ANOVA in that 1)ANOVA is a parametric procedure, Kruskal- Wallis is a nonparametric procedure 2)ANOVA analyzes the difference in means, Kruskal-Wallis analyzes the difference in distributions 3)ANOVA requires the computation of variances, Kruskal-Wallis requires the computation of ranks 4)All of the above

Summary and Homework Summary –Nonparametric statistics describes statistical methods that have few, if any, distribution assumptions –Nonparametric methods apply in a wide variety of situations, but when both can be used, they are in general not as efficient as parametric methods –Nonparametric statistical methods often use medians and rankings to perform the analysis Homework –problems 1-5 from CD

Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Similar presentations

Presentation on theme: "Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

Similar presentations

Presentation on theme: "Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric."— Presentation transcript:

Similar presentations

About project

Feedback