Download presentation

Presentation is loading. Please wait.

Published byAaron Wells Modified over 2 years ago

1
Nonparametric Statistical Methods Presented by Guo Cheng, Ning Liu, Faiza Khan, Zhenyu Zhang, Du Huang, Christopher Porcaro, Hongtao Zhao, Wei Huang 1

2
Introduction

3
Definition Nonparametric methods 1: rank-based methods are used when we have no idea about the population distribution from which the data is sampled. Used for small sample sizes. Used when the data are measured on an ordinal scale and only their ranks are meaningful. 3

4
Outline 1. Sign Test 2. Wilcoxon Signed Rank Test 3. Inferences for Two Independent Samples 4. Inferences for Several Independent Samples 5. Friedman Test 6. Spearman’s Rank Correlation 7. Kendall’s Rank Correlation Coefficient 4

5
1.Sign Test 5

6
Parameter of interest: Median Median is used as a parameter because it is a better measure of data as compared to the mean for skewed distributions. 6

7
Hypothesis test H 0 : µ = µ 0 vs H a : µ > µ 0 where µ 0 is a specified value and µ is unknown median 7

8
Testing Procedure Step 1: Given a random sample x 1, x 2, …, x n from a population with unknown median µ, count the number of x i ’s that exceed µ 0. Denote them by s +. s - = n - s + Step 2: Reject H 0 if s + is large or s - is small. 8

9
How to reject H 0 ? To determine how large s + must be in order to reject H 0, we need to find out the distribution of the corresponding random variable S +. X i : random variable corresponding to the observed values x i S - : random variable corresponding to s - 9

10
Distribution of S + and S - 10

11
Calculating P-value 11

12
Rejection criteria 12

13
Large sample z-test 13

14
Confidence Interval 14

15
Example 15

16
SAS code 16 DATA themostat; INPUT temp; datalines; 202.2 203.4 … ; PROC UNIVARIATE DATA=themostat loccount mu0=200; VAR temp; RUN;

17
SAS Output Basic Statistical Measures Location Variability Mean 201.7700 Std Deviation 2.41019 Median 201.7500 Variance 5.80900 Mode. Range 8.30000 Interquartile Range 2.90000 Tests for Location: Mu0=200 Test -Statistic- -----p Value------ Student's t t 2.322323 Pr > |t| 0.0453 Sign M 3 Pr >= |M| 0.1094 Signed Rank S 19.5 Pr >= |S| 0.048 17

18
2. Wilcoxon signed rank test 18

19
Inventor Frank Wilcoxon (2 September 1892 in County Cork, Ireland – 18 November 1965, Tallahassee, Florida, USA) was a chemist and statistician, known for development of several statistical tests. 19

20
What is it used for? Two related samples Matched samples Repeated measurements on a single sample

21
Hypothesis 21

22
Testing procedure 22

23
Example 23

24
SAS codes 24 DATA thermo; INPUT temp; datalines; 202.2 203.4 … ; PROC UNIVARIATE DATA=thermo loccount mu0=200; TITLE "Wilcoxon signed rank test the thermostat"; VAR temp; RUN;

25
SAS outputs (selected results) 25 8 Basic Statistical Measures Location Variability Mean 201.7700 Std Deviation 2.41019 Median 201.7500 Variance 5.80900 Mode. Range 8.30000 Interquartile Range 2.90000 Tests for Location: Mu0=200 Test -Statistic- -----p Value------ Student's t t 2.322323 Pr > |t| 0.0453 Sign M 3 Pr >= |M| 0.1094 Signed Rank S 19.5 Pr >= |S| 0.048

26
Large sample approximation 26

27
Derive E(x) & Var(x) 27

28
Rejection region: 28

29
3. Inferences for Two Independent Samples 29

30
Hypothesis

31
Definition 31

32
Definition 32

33
Wilcoxon sum rank test 33

34
Mann-Whitney-U test 34

35
Between two tests 35

36
Advantages 36

37
For large samples 37

38
For large samples 38

39
Treatment of ties 39

40
Example To test if the grades of two classes which have the same teacher are the same, we randomly pick 7 students from Class A and 9 from Class B, their scores are as follows A: 8.50 9.48 8.65 8.16 8.83 7.76 8.63 B: 8.27 8.20 8.25 8.14 9.00 8.10 7.20 8.32 7.70 40

41
Example 7.207.707.768.108.148.168.208.25 BBABBABB 12345678 8.278.328.508.638.658.839.009.48 BBAAAABA 910111213141516 41

42
Example 42

43
Example 43

44
SAS code Data exam; Input group $ score @@; Datalines; A 8.50 A 9.48 A 8.65 A 8.16 A 8.83 A 7.76 A 8.63 B 8.27 B 8.20 B 8.25 B 8.14 B 9.00 B 8.10 B 7.20 B 8.32 B 7.70 ; 44

45
SAS code Proc npar1way data=exam wilcoxon; Var score; Class group; Exact wilcoxon; Run; 45

46
Output Wilcoxon Scores (Rank Sums) for Variable score Classified by Variable group groupNSum of Scores Expected Under H0 Std Dev Under H0 Mean Score A775.059.509.44722210.714286 B961.076.509.4472226.777778 46

47
Output Wilcoxon Two-Sample Test Statistic (S)75.0000 Normal Approximation Z1.5878 One-Sided Pr > Z0.0562 Two-Sided Pr > |Z|0.1123 t Approximation One-Sided Pr > Z0.0666 Two-Sided Pr > |Z|0.1332 Exact Test One-Sided Pr >= S0.0571 Two-Sided Pr >= |S - Mean|0.1142 Z includes a continuity correction of 0.5. 47

48
Output 48

49
4. Inferences for Several Independent Samples 49

50
Introduction We know that if our data is normally distributed and that the population standard deviations are equal, we can test for a difference among several populations by using the One-way ANOVA F test. 50

51
When to use Kruskal-Wallis test? But what happens when our data is not normal? This is when we use the nonparametric Kruskal-Wallis test to compare more than two populations as long as our data comes from a continuous distribution. The notion of the kw rank test is to rank all the data from each group together and then apply one-way ANOVA to the ranks rather than to the original data. 51

52
Kruskal-Wallis Test (kw Test) A non-parametric method for testing whether samples originate from the same distribution. Used for comparing more than two samples that are independent. 52

53
Kruskal-Wallis Test: History William Henry Kruskal October 10 th, 1919 – April 21 st, 2005 Obtained Bachelors and Masters degree in Mathematics at Harvard University and received his Ph. D. from Columbia University in 1955. Wilson Allen Wallis November 5 th,1912 – October 12 th, 1998 Undergraduate work at the University of Minnesota and Graduate work at the University of Chicago in 1933. 53

54
Kruskal-Wallis Test: Steps 1. Create Hypothesis: Null Hypothesis (H o ): The samples from populations are identical Alternative Hypothesis (H a ): At least one sample is different 54

55
Kruskal-Wallis Test: Steps 2. Rank all the data. The lowest number gets the lowest rank and so on. Tied data gets the average of the ranks they would have obtained if they weren’t tied. 3. All the ranks of the different samples are added together. Label these sums L 1, L 2, L 3, and L 4. 55

56
Kruskal-Wallis Test: Steps 4. Find Test Statistic: n = total number of observations in all samples L i = total rank of each sample kw = test statistic 5. Reject H o if H is greater than the chi-square table value. 56

57
Kruskal-Wallis Test: Example An experiment was done to compare four different ways of teaching a concept to a class of students. In this experiment, 28 tenth grade classes were randomly assigned to the four methods (7 classes per method). A 45 question test was given to each class. The average test scores of the classes are given in the following table. Apply the Kruskal- Wallis test to the test scores data set. 57

58
Kruskal-Wallis Test: Example Given Data Ranks of Data values 58

59
Kruskal-Wallis Test: Example 59

60
Kruskal-Wallis Test: Example 60

61
SAS Input data test; input methodname $ scores; cards; case 14.59 case 23.44 case 25.43 case 18.15 Case 20.82 Case 14.06 Case 14.26 Formula 20.27 Formula 26.84 Formula 14.71 Formula 22.34 Formula 19.49 Formula 24.92 Formula 20.20 Equation 27.82 Equation 24.92 Equation 28.68 Equation 23.32 Equaiton 32.85 Equation 33.90 Equation 23.42 Unitary 33.16 Unitary 26.93 Unitary 30.43 Unitary 36.43 Unitary 37.04 Unitary 29.76 Unitary 33.88 ; proc npar1way data=test wilcoxon; class methodname; var scores; run; 61

62
SAS Output Wilcoxon Scores (Rank Sums) for Variable scores Classified by Variable methodname Sum of Expected Std Dev Mean methodname N Scores Under H0 Under H0 Score case 7 49.00 101.50 18.845498 7.000000 formula 7 66.50 101.50 18.845498 9.500000 equation 7 125.50 101.50 18.845498 17.928571 unitary 7 165.00 101.50 18.845498 23.571429 Average scores were used for ties. Kruskal-Wallis Test Chi-Square 18.1390 DF 3 Pr > Chi-Square 0.0004 62

63
4. Friedman Test 63

64
Introduction A distribution-free rank-based test for comparing the treatments is known as the Friedman test, named after the Nobel Laureate economist Milton Friedman who proposed it. The Friedman Test is a version of the repeated-Measures ANOVA that can be performed on ordinal(ranked) data. 64

65
Steps in the Friedman test 65

66
Steps in the Friedman test 66

67
Example Now we have 8 treatments separated in 3 blocks, α = 0.025 67

68
Define Null and Alternative Hypothesis H 0 : There is no difference between 8 treatments H a : There exists difference between 8 treatments 68

69
Rank Sum 69

70
Friedman Test 70

71
Conclusion 71

72
5. Spearman’s Rank Correlation Coefficient 72

73
Introduction From Pearson to Spearman Spearman’s Rank Correlation Coefficient Large-Sample Approximation Hypothesis Test Examples 73

74
From Pearson to Spearman Pearson’s Measure only the degree of linear association Based on the assumption of bivariate normally of two variables Spearman’s Take in account only the ranks Measure the degree of monotone association Inferences on the rank correlation coefficients are distribution-free 74

75
From Pearson to Spearman 75

76
From Pearson to Spearman Charles Edward Spearman As a psychologist ① General factor of intelligence ② the nature and causes of variations in human As a statistician ① Rank correlation ② two-way analysis Charles Edward Spearman (10 Sept. 1863 – 17 Sept. 1945) ③ Correlation coefficient 76

77
Spearman’s Rank Correlation Coefficient 77

78
Spearman’s Rank Correlation Coefficient 78

79
Large sample approximation 79

80
Hypothesis testing 80

81
Example Table 5.1 Wine Consumption and Heart Disease Deaths Country Australia2.5211Netherlands1.8167 Austria3.9167New Zealand1.9266 Belgium2.9131Norway0.8227 Canada2.4191Spain6.586 Denmark2.9220Sweden1.6207 Finland0.8297Switzerland5.8115 France9.171U.K.1.3285 Iceland0.8211U.S.1.2199 Ireland0.7300W. Germany2.7172 Italy7.9107 81

82
Example 82

83
Example Table 5.2 Ranks of Wine Consumption and Heart Disease Deaths 11112.5-1.51186.51.5 2156.58.512916-7.0 313.55-8.513315-12.0 41091.01417215.0 513.514-0.515711-4.0 6318-15.016 412.0 719118.0176 -11.0 8312.5-9.518510-5.0 9119-18.0191284.0 1018315.0 83

84
Example 84

85
Example 85

86
6. Kendall’s Rank Correlation Coefficient 86

87
Kendall’s Tau It is a coefficient use to measure the association between two pairs of ranked data. Named after British statistician Maurice Kendall who developed it in 1938. Ranges from -1.0 to 1.0 Tau-a (with no ties) and Tau-b (with ties) 87

88
Formula for Tau-a 88

89
Concordant and Discordant 89

90
Example 1 Kendall’s tau-a Raw data for 11 students in 2 exams: Exam 1Exam 2 85 9895 9080 8375 5770 6365 7773 9993 8079 9688 6974 90

91
Ranks of exam results Exam1 xExam 2 ycd 1 291 2 190 3 380 4 561 5 460 6 741 7 640 8 921 9 820 101101 10C=50D=5 91

92
Calculation for ṫ 92

93
Steps for calculating ṫ 1.Sort data x in ascending order, pair y ranks with x 2.Count c and d for each y 3.Sum C and D 4.Use formula to calculate ṫ 93

94
Formula for tau-b(with ties) 94

95
Example 2 Kendall’s tau-b Wine Consumption and heart disease deaths data iCountryxiyicd 1Ireland0.7300018 2Iceland0.8211311 2Norway0.8227213 4Finland0.8297015 5U.S.1.219959 6U.K1.3285013 7Sweden1.620739 8Netherlands1.816755 9N. Z1.9266010 Canada2.419127 11Australia2.521117 12Germany2.717216 13Belgium2.913124 14Denmark2.922005 15Austria3.916704 16Switzerland5.811503 17Spain6.58611 18Italy7.910701 19France9.17100 C=25D=141 95

96
Calculation for tau-b 96

97
Hypothesis Test for τ 97

98
Hypothesis test results 98

99
Hypothesis test results 99

100
100

101
Example 1 extension Exam1 xExam 2 y Kendall's τ Spearman r s 129111 219011 338000 456111 546011 674111 764011 892111 982011 10110111 1011 C=50D=5 101

102
102

103
103

104
SAS Code Data exams; Input exam1 exam2; Datalines; 85 98 95 … ; Run; Proc corr data=exams kendall; Var exam1 exam2; Run; 104

105
SAS output 105

106
7. Conclusion 106

107
Summary Nonparametric tests are very useful when we don’t know anything about the distributions. Especially when the distribution is not normal, we can’t use T-test, then we have to study the nonparametric methods. Median is a better measurement of central tendency for non-normal population. Sample can be ordinal and sample size is usually small. 107

108
Summary In summary, we have briefly introduced some most common methods in our presentation including: Sign test Wilcoxon rank sum test and signed rank test Kruskal-Wallis Test Friedman Test Spearman’s Rank Correlation Kendall’s Rank Correlation Coefficient 108

109
Questions Q1Q2… 109

110
The End. Thank You ! 110

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google