1 Nonparametric Statistical Techniques Chapter 17.

Slides:



Advertisements
Similar presentations
1 Chapter 20: Statistical Tests for Ordinal Data.
Advertisements

Chapter 16 Introduction to Nonparametric Statistics
Is it statistically significant?
1 Chapter 12 Inference About One Population Introduction In this chapter we utilize the approach developed before to describe a population.In.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Nonparametric Methods Chapter 15.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
© 2003 Pearson Prentice Hall Statistics for Business and Economics Nonparametric Statistics Chapter 14.
Introduction to Hypothesis Testing
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter 12 Chi-Square Tests and Nonparametric Tests
Introduction to Hypothesis Testing
Chapter 16 Chi Squared Tests.
1 Chapter 12 Inference About a Population 2 Introduction In this chapter we utilize the approach developed before to describe a population.In this chapter.
1 Pertemuan 11 Analisis Varians Data Nonparametrik Matakuliah: A0392 – Statistik Ekonomi Tahun: 2006.
© 2004 Prentice-Hall, Inc.Chap 10-1 Basic Business Statistics (9 th Edition) Chapter 10 Two-Sample Tests with Numerical Data.
The Kruskal-Wallis Test The Kruskal-Wallis test is a nonparametric test that can be used to determine whether three or more independent samples were.
Chapter 15 Nonparametric Statistics
Nonparametric or Distribution-free Tests
© 2011 Pearson Education, Inc
AM Recitation 2/10/11.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 12-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Overview of Statistical Hypothesis Testing: The z-Test
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
14 Elements of Nonparametric Statistics
More About Significance Tests
NONPARAMETRIC STATISTICS
CHAPTER 14: Nonparametric Methods
Chapter 11 Nonparametric Tests.
What are Nonparametric Statistics? In all of the preceding chapters we have focused on testing and estimating parameters associated with distributions.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 21 Nonparametric Statistics.
Economics 173 Business Statistics Lecture 6 Fall, 2001 Professor J. Petry
Copyright © 2009 Cengage Learning 15.1 Chapter 16 Chi-Squared Tests.
1 Inference about Two Populations Chapter Introduction Variety of techniques are presented to compare two populations. We are interested in:
Wilcoxon rank sum test (or the Mann-Whitney U test) In statistics, the Mann-Whitney U test (also called the Mann-Whitney-Wilcoxon (MWW), Wilcoxon rank-sum.
© 2000 Prentice-Hall, Inc. Statistics Nonparametric Statistics Chapter 14.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics S eventh Edition By Brase and Brase Prepared by: Lynn Smith.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Keller: Stats for Mgmt & Econ, 7th Ed Nonparametric Statistics
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Inference about Two Means: Independent Samples 11.3.
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
Example (which tire lasts longer?) To determine whether a new steel-belted radial tire lasts longer than a current model, the manufacturer designs the.
Nonparametric Statistics. In previous testing, we assumed that our samples were drawn from normally distributed populations. This chapter introduces some.
Nonparametric Tests IPS Chapter 15 © 2009 W.H. Freeman and Company.
Nonparametric Statistics
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Non-parametric: Analysis of Ranked Data Chapter 18.
Kruskal-Wallis H TestThe Kruskal-Wallis H Test is a nonparametric procedure that can be used to compare more than two populations in a completely randomized.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Statistics in Applied Science and Technology Chapter14. Nonparametric Methods.
CD-ROM Chap 16-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition CD-ROM Chapter 16 Introduction.
BPS - 5th Ed. Chapter 251 Nonparametric Tests. BPS - 5th Ed. Chapter 252 Inference Methods So Far u Variables have had Normal distributions. u In practice,
NON-PARAMETRIC STATISTICS
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
NONPARAMETRIC STATISTICS In general, a statistical technique is categorized as NPS if it has at least one of the following characteristics: 1. The method.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Lesson Test to See if Samples Come From Same Population.
SUMMARY EQT 271 MADAM SITI AISYAH ZAKARIA SEMESTER /2015.
1 Nonparametric Statistical Techniques Chapter 18.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. CHAPTER 14: Nonparametric Methods to accompany Introduction to Business Statistics fifth.
Chapter 9 Introduction to the t Statistic
Nonparametric Statistics
Chapter Nine Hypothesis Testing.
NONPARAMETRIC STATISTICS
Inference about Comparing Two Populations
Data Analysis and Interpretation
Lecture Slides Elementary Statistics Twelfth Edition
Chapter 12 Nonparametric Methods
The Rank-Sum Test Section 15.2.
Presentation transcript:

1 Nonparametric Statistical Techniques Chapter 17

2 The statistical techniques introduced in this chapter deal with ordinal data. We test to determine whether the population locations differ. In testing the locations we will not refer to any parameter, thus the procedure’s name Introduction

3 When comparing two populations the hypotheses generally are: 17.1 Introduction H 0 : The population locations are the same H 1 : (i) The locations differ, or (ii) Population 1 is located to the right (left) of population 2 The random variable X 1 is generally larger (smaller) than X 2.

Wilcoxon Rank Sum Test The problem characteristics of this test are: The problem objective is to compare two populations. The data are either ordinal or interval (but not normal). The samples are independent.

5 Wilcoxon Rank Sum Test – Example Example 17.1 Based on the two samples shown below, can we infer at 5% significance level that the location of population 1 is to the left of the location of population 2? Sample 1: 22, 23, 20; Sample 2: 18, 27, 26; The hypotheses are: H 0 : The two population locations are the same. H 1 : The location of population 1 is to the left of the location of population 2.

6 Graphical Demonstration Why use the sum of ranks to test locations? Sum of ranks = 37Sum of ranks = If the locations of the two populations are about the same, (the null hypothesis is true) we would expect the ranks to be evenly spread between the samples. In this case the sum of ranks for the two samples will be close to one another. Two hypothetical populations and their corresponding samples are presented, the GREEN population and the PURPLE population. Populations Let us rank the observations of the two samples together

7 Allow the GREEN population to shift to the left of the PURPLE population. Graphical Demonstration Why use the sum of ranks to test locations?

Sum of ranks = 38Sum of ranks = The green sample is expected to shift to the left too. As a result, several observations exchange location. What happens to the sum of ranks? Click. Attention Sum of ranks = 37Sum of ranks = 41 Sum of ranks = 45 Sum of ranks = 33 Graphical Demonstration Why use the sum of ranks to test locations?

9 67 Sum of ranks = 38Sum of ranks = Sum of ranks = 37Sum of ranks = 41 Sum of ranks = 45 Sum of ranks = 33 The “green” sum decreases, and the “purple” sum increases. Changing the relative location of two populations affect the sum of ranks of the two samples combined. Graphical Demonstration Why use the sum of ranks to test locations?

10 Example 17.1 – continued Test statistic 1. Rank all the six observations (1 for the smallest). Sample Sample Rank Calculate the sum of ranks: 9 2. Calculate the sum of ranks:12 3. Let T = 9 be the test statistic (We arbitrarily define the test statistic as the rank sum of sample 1. Wilcoxon Rank Sum Test – Example

11 Example continued If T is sufficiently small then most of the smaller observations are located in population 1. Reject the null hypothesis. Question: How small is sufficiently small? We need to look at the distribution of T. Wilcoxon Rank Sum Test – Rationale

12 1,2, ,2,41,2,51,2,6 1,3,4 1,3,6 1,3,51,4,5 1,4,61,5,6 2,3,42,3,5 2,3,6 2,4,5 2,4,6 2,5,6 3,4,5 3,4,6 3,5,64,5,6 T T is the rank sum of a sample of size 3. This sample received the ranks 3, 4, 5 If H 0 is true (the two populations have the same location), each ranking is equally likely, and each possible value of T has the same probability = 1/20 This sample received the ranks 1, 2, 3 The distribution of T under H 0 for two samples of size 3

13 The distribution of T under H 0 for two samples of size 3 1,2, ,2,41,2,51,2,6 1,3,4 1,3,6 1,3,51,4,5 1,4,61,5,6 2,3,42,3,5 2,3,6 2,4,5 2,4,6 2,5,6 3,4,5 3,4,6 3,5,64,5,6 T The significance level is 5%, and under H 0 P(T  6) =.05. Thus, the critical value of T is 6.

14 Example continued Conclusion H 0 is rejected if T  Since T = 9, there is insufficient evidence to conclude that population 1 is located to the left of population 2, at the 5% significance level. Wilcoxon Rank Sum Test – Example

15 Critical values of the Wilcoxon Rank Sum Test  =.025 for two tail test, or  =.05 for one tail test Using the table: For given two samples of sizes n 1 and n 2, P(T T U )=  For a two tail test: P(T 25) =.025 if n 1 =4 and n 2 =4. For a one tail test: P(T 25) =.05 if n 1 =4 and n 2 = A similar table exists for  =.05 (one tail test) and  =.10 (two tail test) T L T U T L T U T L T U T L T U

16 Wilcoxon rank sum test for samples where n > 10 The test statistic is approximately normally distributed with the following parameters: n 1 (n 1 + n 2 + 1) 2 E(T) = Therefore, Z = T - E(T)  T

17 Example 17.2 (using Wilcoxon rank sum test with ordinal data)Example 17.2 A pharmaceutical company is planning to introduce a new painkiller. To determine the effectiveness of the drug, 30 people were randomly selected. 15 were given the tested drug (Sample 1). 15 were given aspirin (Sample 2). Each participant was asked to indicate which one of five statements best represented the effectiveness of the drug they took. Wilcoxon rank sum test for samples where n > 10, Example

18 Example 17.2 – continued Summary of the experiment results. Solution The objective is to compare two populations of ordinal data. The two samples are independent. Wilcoxon rank test is the appropriate technique to apply. Wilcoxon test for samples where n > 10, Example

19 The hypotheses H 0 : The locations of population 1 and 2 are the same H 1 : The location of population 1 is to the right of the location of population 2. Note: A high score selected from among the five possible scores 1, 2, 3, 4, 5, indicates high effectiveness. Wilcoxon rank sum test for samples where n > 10, Example Received the new painkillerReceived Aspirin Solving by hand To reject the null hypothesis, we need to show that z is “large enough”. First we rank the observations, Secondly, we run a z-test, with rejection region of Z > Z .

20 Ranking the raw data There are three observations with an effectiveness score of 1. The original ranks for these observations are 1, 2, and 3. This tie is broken by giving each observation the average rank of 2. Sum of ranks: T 1 =276.5T 2 =188.5 These are the effectiveness scores provided by the experiment participants for each drug. Wilcoxon rank sum test for samples where n > 10, Example

21 To standardize the test statistic we need: E(T) = n 1 (n 1 +n 2 +1)/2= (15)(31)/2=232.5 Wilcoxon rank sum test for samples where n > 10, Example

22 For 5% significance level z= Since z = 1.83 > 1.645, there is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At 5% significance level, the new drugs is perceived as more effective than Aspirin. Wilcoxon rank sum test for samples where n > 10, Example

23 Excel solution (Xm17-02)Xm17-02 Wilcoxon rank sum test for samples where n > 10, Example

24 Wilcoxon rank sum test for non- normal interval data, Example The human resource manager of a large company wanted to compare how long business and non-business graduates worked for the company before quitting. Two samples of 25 business graduates and 20 non-business graduates were randomly selected. The data representing their time with the company were recorded. Retaining WorkersWorkers

25 Can the personnel manager conclude at 5% significance level that a difference in duration of employment exists between business and non- business graduates? Retaining workers - continued Wilcoxon rank sum test for non- normal interval data, Example

26 Solution The problem objective is to compare two populations of interval data. The samples are independent. The non-normality of the two populations is apparent from the sample histograms: Non Business graduates Business graduates Wilcoxon rank sum test for non- normal interval data, Example

27 Solution – continued The Wilcoxon rank test is the correct procedure to run. H 0 : The two population locations are the same H 1 : The location of population 1(business graduates) is different from the location of population 2 (non- business graduates). Wilcoxon rank sum test for non- normal interval data, Example

28 Solution – continued Solving by hand The rejection region is After the ranking process is completed, we have: T = T business graduates = 463. E(T) = n 1 (n 1 +n 2 +1)/2=575;  T =[n 1 n 2 (n 1 +n 2 +1)/12] 1/2 =43.8 Reject the null hypothesis Wilcoxon rank sum test for non-normal interval data, Example

29 Excel solution (Workers.xls)Workers.xls There is a strong evidence to infer that the duration of employment is different for business and non-business graduates Wilcoxon rank sum test for non-normal interval data, Example

30 Required conditions for nonparametric tests A rejection of the null hypothesis when performing a nonparametric test can occur due to: different location different spread (variance) different shape (distribution). Since we are interested in the location, we require that the two distributions are identical, except for location.

Sign Test and Wilcoxon Signed Rank Sum Test Two techniques for matched pairs experiment are introduced. the objective is to compare two populations. the data are either ordinal or interval (but not normal). The samples are matched by pairs.

32 The Sign Test This test is employed when: The problem objective is to compare two populations, and The data are ordinal, and The experimental design is matched pairs. The hypotheses H 0 : The two population locations are the same H 1 : The two population locations differ or population 1 is right (left) of population 2

33 The Sign Test – Statistic and Sampling Distribution A matched pair experiment calls for a test of matched pair differences. The test statistic and sampling distribution Record the sign of all the matched-pair-differences. The number of positive (or negative) differences is the test statistic.

34 The number of positive or negative differences is binomial, with: n = the number of non-zero differences p = the probability that a difference is positive (negative) If the two populations have the same locations (H 0 is true), it is expected that Thus, under H 0 : p = 0.5 Number of positive differences = Number of negative differences The Sign Test - Rationale

35 The test statistic and sampling distribution The hypotheses: H 0 : The two population locations are the same H 1 : The two population locations are different The Sign Test - Rationale H 0 : p .5 H 1 : p .5

36 The Test – continued The hypotheses tested H 0 : p .5 H 1 : p .5 The binomial variable can be approximated by a normal variable if np and n(1-p) > 5. The Z- statistic becomes The Sign Test – Statistic and Sampling Distribution

37 Example 17.3 (Xm17-03)Xm17-03 In an experiment to determine which car is perceived to have the more comfortable ride, 25 people took two rides: One ride in a European model. One ride in a North American car. Each person ranked the cars on a scale of 1 (ride is very uncomfortable) to 5 (ride is very comfortable). The Sign Test – Example

38 Do these data allow us to conclude at 5% significance level that the European car is perceived to be more comfortable? The Sign Test – Example

39 Solution We compare two populations The data are ordinal A matched pair experiment The Sign Test – Example

40 Solution –The hypotheses are: H 0 : The two population location are the same. H 1: The European car population is located to the right of the American car population. –The test. There were 18 positive, 5 negatives, and 2 zero differences. Thus, X = 18, n = 23(!). Z = [x-np]/[np(1-p)].5 = [18-.5(23)]/[.5{23}.5 ] = 2.71 The rejection region is z > z . For  =.05 we have z > The p-value = P(Z > 2.71) =.0034 The Sign Test – Example

41 Using the computer: Tools > Data Analysis Plus > Sign Test Excel – Solution (Xm17-03)Xm17-03 The Sign Test – Example

42 Conclusion: Since the p-value <  we reject the null hypothesis. At 5% significance level there is sufficient evidence to infer that the European car is perceived as more comfortable than the American car. The Sign Test – Example

43 Checking the required conditions Observe the sample histograms (Xm17-03)Xm17-03 The populations are similar in shape and spread The Sign Test – Example

44 This test is used when the problem objective is to compare two populations, the data are interval but not normal, the samples are matched pairs. The test statistic and sampling distribution T is based on rank sum of the absolute values of the positive and negative differences When n T U or T<T L (T L and T U tabulated values related to n). When n > 30, T is approximately normally distributed. Use a Z-test. Wilcoxon Signed Rank Sum Test

45 Example 17.4 Does “flextime” work-schedule help reduce the travel time of workers to work? A random sample of 32 workers was selected, and workers recorded their travel time before and after the program was implemented. The hypotheses test are The two population locations are the same. The two population locations are different. Wilcoxon Signed Rank Sum Test, Example

46 Example 17.4 Does “flextime” work-schedule help reduce the travel time of workers to work? A random sample of 32 workers was selected, and workers recorded their travel time before and after the program was implemented. The hypotheses are H 0 : The two population locations are the same. H 1 : The two population locations are different. The rejection region: |z| > z  The rejection region: |z| > z  Wilcoxon Signed Rank Sum Test, Example

47 This data were sorted by the absolute value of the differences Ties were broken by assigning the average rank to the tied observations Average rank = (1 + 8)/2 = 4.5

48 T is the rank sum of the positive differences. T = T + = E(T) = n(n+1)/4 = 32(33)/4 = 264  T = [n(n+1)(2n+1)/24].5 = The test statistic is: Z =  TT E(T)T  T  E(T) T = =

49 Excel – solution (Xm17-04)Xm17-04 Wilcoxon Signed Rank Sum Test, Example

50 The rejection region for  =.05 is |z| > z.025 = 1.96 Conclusion: Since |1.94| < 1.96, There is insufficient evidence to infer that the flextime program was effective at 5% significance level. Solution – continued Wilcoxon Signed Rank Sum Test, Example

Kruskal-Wallis Test The problem characteristics for this test are: The problem objective is to compare two or more populations. The data are either ordinal or interval but not normal. The samples are independent. The hypotheses are H 0 : The location of all the k populations are the same. H 1 : At least two population locations differ.

52 Rank the data from 1(smallest) to n (largest). Calculate the rank sums T 1, T 2,…T k for all the k samples. Calculate the statistic H as follows: Kruskal-Wallis Test Statistic

53 Test Rationale and Rejection region If all the populations have the same location (H 0 is true)… The ranks should be evenly distributed among the k samples. The statistic H will be small. Uneven distribution of ranks T 1 =6T 2 =15T 3 =24 H = 7.2 Even distribution of ranks T 1 =14T 2 =15T 3 =16 H =.0888

54 Sampling distribution When the sample sizes  5, H is approximately chi-squared distributed with k-1 degrees of freedom. The rejection region: Since a large value of H justifies the rejection of H 0, we have: Test Rationale and Rejection Region

55 Example 17.5 How do customers rate three shifts with respect to speed of service in a certain restaurant? Three samples of 10 customer response-cards were randomly selected, one sample from each shift. Customer ratings were recorded. The Kruskal-Wallis Test Example

56 Can we conclude that customers perceive the speed of service to be different among the three shifts at 5% significance level? The Kruskal-Wallis Test Example

57 Solution The problem objective is to compare three populations. The data are ordinal. The hypotheses: H 0 : The locations of all three populations are the same. H 1 : At least two population locations differ. The Kruskal-Wallis Test Example

58 Solution - continued Test statistic: T 1 = T 2 = T 3 = n = n 1 + n 2 + n 3 = = 30 Ranking The Kruskal-Wallis Test Example

59 For  =.05,  2 ,k-1 =  2.05,2 = Solution - continued The critical value The Kruskal-Wallis Test Example

60 The Kruskal-Wallis Test Example Solution – Excel (Xm17-05)Xm17-05

61 Conclusion: Since H=2.64 < , do not reject the null hypothesis. There is insufficient evidence to conclude at 5% significance level, that there is a difference in customers’ perception regarding service speed among the three shifts. The Kruskal-Wallis Test Example

Friedman Test The problem characteristics of this test are: The problem objective is to compare two or more populations. The data are either ordinal or interval but not normal. (For normal populations we use ANOVA). The data are generated from a blocked experiment (samples are not independent). The hypotheses are The location of all the k populations are the same. At least two population locations differ.

63 Test Statistic and Rejection Region The test statistic is The rejection region is b = the number of blocks K = the number of treatments

64 The Friedman Test Example Example 17.6 Four managers evaluate applicants for a job in an accounting firm on several dimensions. Eight applicants were randomly selected, and their evaluations by the four managers recorded. Can we conclude at 5% significance level that there are differences in the way managers evaluate candidates? Can we conclude at 5% significance level that there are differences in the way managers evaluate candidates?

65 Solution The problem objective is to compare four populations Data are ordinal. This is a randomized block design experiment because each applicant (block) was ranked four times. The appropriate procedure is the Friedman test The Friedman Test Example

66 Solution The hypotheses are H 0 : The locations of all four populations are the same. H 1 : At least two population locations differ. The data The Friedman Test Example

67 T 1 = 21 T 2 = 10 T 3 = 24.5 T 4 = The Friedman Test Example How to rank, block by block. Applicant 1: Scores: Actual ranks: Averaged ranks:

68 Solution In our problem: b = 8 (number of blocks) k = 4 (number of treatments, populations) The Friedman Test Example

69 Solution We have : F r = 10.61; Let  =.05, then  2.05, 4-1 = The Friedman Test Example

70 The Friedman Test Example Solution – Excel (Xm17-06)Xm17-06

71 Conclusion: Since F r =10.61> , reject the null hypothesis. There is sufficient evidence to conclude at 5% significance level, that the managers’ evaluations differ. The Friedman Test Example