Download presentation

Presentation is loading. Please wait.

Published byAaron Wells Modified about 1 year ago

1
Nonparametric Statistical Methods Presented by Guo Cheng, Ning Liu, Faiza Khan, Zhenyu Zhang, Du Huang, Christopher Porcaro, Hongtao Zhao, Wei Huang 1

2
Introduction

3
Definition Nonparametric methods 1: rank-based methods are used when we have no idea about the population distribution from which the data is sampled. Used for small sample sizes. Used when the data are measured on an ordinal scale and only their ranks are meaningful. 3

4
Outline 1. Sign Test 2. Wilcoxon Signed Rank Test 3. Inferences for Two Independent Samples 4. Inferences for Several Independent Samples 5. Friedman Test 6. Spearman’s Rank Correlation 7. Kendall’s Rank Correlation Coefficient 4

5
1.Sign Test 5

6
Parameter of interest: Median Median is used as a parameter because it is a better measure of data as compared to the mean for skewed distributions. 6

7
Hypothesis test H 0 : µ = µ 0 vs H a : µ > µ 0 where µ 0 is a specified value and µ is unknown median 7

8
Testing Procedure Step 1: Given a random sample x 1, x 2, …, x n from a population with unknown median µ, count the number of x i ’s that exceed µ 0. Denote them by s +. s - = n - s + Step 2: Reject H 0 if s + is large or s - is small. 8

9
How to reject H 0 ? To determine how large s + must be in order to reject H 0, we need to find out the distribution of the corresponding random variable S +. X i : random variable corresponding to the observed values x i S - : random variable corresponding to s - 9

10
Distribution of S + and S - 10

11
Calculating P-value 11

12
Rejection criteria 12

13
Large sample z-test 13

14
Confidence Interval 14

15
Example 15

16
SAS code 16 DATA themostat; INPUT temp; datalines; … ; PROC UNIVARIATE DATA=themostat loccount mu0=200; VAR temp; RUN;

17
SAS Output Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode. Range Interquartile Range Tests for Location: Mu0=200 Test -Statistic p Value Student's t t Pr > |t| Sign M 3 Pr >= |M| Signed Rank S 19.5 Pr >= |S|

18
2. Wilcoxon signed rank test 18

19
Inventor Frank Wilcoxon (2 September 1892 in County Cork, Ireland – 18 November 1965, Tallahassee, Florida, USA) was a chemist and statistician, known for development of several statistical tests. 19

20
What is it used for? Two related samples Matched samples Repeated measurements on a single sample

21
Hypothesis 21

22
Testing procedure 22

23
Example 23

24
SAS codes 24 DATA thermo; INPUT temp; datalines; … ; PROC UNIVARIATE DATA=thermo loccount mu0=200; TITLE "Wilcoxon signed rank test the thermostat"; VAR temp; RUN;

25
SAS outputs (selected results) 25 8 Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode. Range Interquartile Range Tests for Location: Mu0=200 Test -Statistic p Value Student's t t Pr > |t| Sign M 3 Pr >= |M| Signed Rank S 19.5 Pr >= |S| 0.048

26
Large sample approximation 26

27
Derive E(x) & Var(x) 27

28
Rejection region: 28

29
3. Inferences for Two Independent Samples 29

30
Hypothesis

31
Definition 31

32
Definition 32

33
Wilcoxon sum rank test 33

34
Mann-Whitney-U test 34

35
Between two tests 35

36
Advantages 36

37
For large samples 37

38
For large samples 38

39
Treatment of ties 39

40
Example To test if the grades of two classes which have the same teacher are the same, we randomly pick 7 students from Class A and 9 from Class B, their scores are as follows A: B:

41
Example BBABBABB BBAAAABA

42
Example 42

43
Example 43

44
SAS code Data exam; Input group $ score Datalines; A 8.50 A 9.48 A 8.65 A 8.16 A 8.83 A 7.76 A 8.63 B 8.27 B 8.20 B 8.25 B 8.14 B 9.00 B 8.10 B 7.20 B 8.32 B 7.70 ; 44

45
SAS code Proc npar1way data=exam wilcoxon; Var score; Class group; Exact wilcoxon; Run; 45

46
Output Wilcoxon Scores (Rank Sums) for Variable score Classified by Variable group groupNSum of Scores Expected Under H0 Std Dev Under H0 Mean Score A B

47
Output Wilcoxon Two-Sample Test Statistic (S) Normal Approximation Z One-Sided Pr > Z Two-Sided Pr > |Z| t Approximation One-Sided Pr > Z Two-Sided Pr > |Z| Exact Test One-Sided Pr >= S Two-Sided Pr >= |S - Mean| Z includes a continuity correction of

48
Output 48

49
4. Inferences for Several Independent Samples 49

50
Introduction We know that if our data is normally distributed and that the population standard deviations are equal, we can test for a difference among several populations by using the One-way ANOVA F test. 50

51
When to use Kruskal-Wallis test? But what happens when our data is not normal? This is when we use the nonparametric Kruskal-Wallis test to compare more than two populations as long as our data comes from a continuous distribution. The notion of the kw rank test is to rank all the data from each group together and then apply one-way ANOVA to the ranks rather than to the original data. 51

52
Kruskal-Wallis Test (kw Test) A non-parametric method for testing whether samples originate from the same distribution. Used for comparing more than two samples that are independent. 52

53
Kruskal-Wallis Test: History William Henry Kruskal October 10 th, 1919 – April 21 st, 2005 Obtained Bachelors and Masters degree in Mathematics at Harvard University and received his Ph. D. from Columbia University in Wilson Allen Wallis November 5 th,1912 – October 12 th, 1998 Undergraduate work at the University of Minnesota and Graduate work at the University of Chicago in

54
Kruskal-Wallis Test: Steps 1. Create Hypothesis: Null Hypothesis (H o ): The samples from populations are identical Alternative Hypothesis (H a ): At least one sample is different 54

55
Kruskal-Wallis Test: Steps 2. Rank all the data. The lowest number gets the lowest rank and so on. Tied data gets the average of the ranks they would have obtained if they weren’t tied. 3. All the ranks of the different samples are added together. Label these sums L 1, L 2, L 3, and L 4. 55

56
Kruskal-Wallis Test: Steps 4. Find Test Statistic: n = total number of observations in all samples L i = total rank of each sample kw = test statistic 5. Reject H o if H is greater than the chi-square table value. 56

57
Kruskal-Wallis Test: Example An experiment was done to compare four different ways of teaching a concept to a class of students. In this experiment, 28 tenth grade classes were randomly assigned to the four methods (7 classes per method). A 45 question test was given to each class. The average test scores of the classes are given in the following table. Apply the Kruskal- Wallis test to the test scores data set. 57

58
Kruskal-Wallis Test: Example Given Data Ranks of Data values 58

59
Kruskal-Wallis Test: Example 59

60
Kruskal-Wallis Test: Example 60

61
SAS Input data test; input methodname $ scores; cards; case case case case Case Case Case Formula Formula Formula Formula Formula Formula Formula Equation Equation Equation Equation Equaiton Equation Equation Unitary Unitary Unitary Unitary Unitary Unitary Unitary ; proc npar1way data=test wilcoxon; class methodname; var scores; run; 61

62
SAS Output Wilcoxon Scores (Rank Sums) for Variable scores Classified by Variable methodname Sum of Expected Std Dev Mean methodname N Scores Under H0 Under H0 Score case formula equation unitary Average scores were used for ties. Kruskal-Wallis Test Chi-Square DF 3 Pr > Chi-Square

63
4. Friedman Test 63

64
Introduction A distribution-free rank-based test for comparing the treatments is known as the Friedman test, named after the Nobel Laureate economist Milton Friedman who proposed it. The Friedman Test is a version of the repeated-Measures ANOVA that can be performed on ordinal(ranked) data. 64

65
Steps in the Friedman test 65

66
Steps in the Friedman test 66

67
Example Now we have 8 treatments separated in 3 blocks, α =

68
Define Null and Alternative Hypothesis H 0 : There is no difference between 8 treatments H a : There exists difference between 8 treatments 68

69
Rank Sum 69

70
Friedman Test 70

71
Conclusion 71

72
5. Spearman’s Rank Correlation Coefficient 72

73
Introduction From Pearson to Spearman Spearman’s Rank Correlation Coefficient Large-Sample Approximation Hypothesis Test Examples 73

74
From Pearson to Spearman Pearson’s Measure only the degree of linear association Based on the assumption of bivariate normally of two variables Spearman’s Take in account only the ranks Measure the degree of monotone association Inferences on the rank correlation coefficients are distribution-free 74

75
From Pearson to Spearman 75

76
From Pearson to Spearman Charles Edward Spearman As a psychologist ① General factor of intelligence ② the nature and causes of variations in human As a statistician ① Rank correlation ② two-way analysis Charles Edward Spearman (10 Sept – 17 Sept. 1945) ③ Correlation coefficient 76

77
Spearman’s Rank Correlation Coefficient 77

78
Spearman’s Rank Correlation Coefficient 78

79
Large sample approximation 79

80
Hypothesis testing 80

81
Example Table 5.1 Wine Consumption and Heart Disease Deaths Country Australia2.5211Netherlands Austria3.9167New Zealand Belgium2.9131Norway Canada2.4191Spain6.586 Denmark2.9220Sweden Finland0.8297Switzerland France9.171U.K Iceland0.8211U.S Ireland0.7300W. Germany Italy

82
Example 82

83
Example Table 5.2 Ranks of Wine Consumption and Heart Disease Deaths

84
Example 84

85
Example 85

86
6. Kendall’s Rank Correlation Coefficient 86

87
Kendall’s Tau It is a coefficient use to measure the association between two pairs of ranked data. Named after British statistician Maurice Kendall who developed it in Ranges from -1.0 to 1.0 Tau-a (with no ties) and Tau-b (with ties) 87

88
Formula for Tau-a 88

89
Concordant and Discordant 89

90
Example 1 Kendall’s tau-a Raw data for 11 students in 2 exams: Exam 1Exam

91
Ranks of exam results Exam1 xExam 2 ycd C=50D=5 91

92
Calculation for ṫ 92

93
Steps for calculating ṫ 1.Sort data x in ascending order, pair y ranks with x 2.Count c and d for each y 3.Sum C and D 4.Use formula to calculate ṫ 93

94
Formula for tau-b(with ties) 94

95
Example 2 Kendall’s tau-b Wine Consumption and heart disease deaths data iCountryxiyicd 1Ireland Iceland Norway Finland U.S U.K Sweden Netherlands N. Z Canada Australia Germany Belgium Denmark Austria Switzerland Spain Italy France C=25D=141 95

96
Calculation for tau-b 96

97
Hypothesis Test for τ 97

98
Hypothesis test results 98

99
Hypothesis test results 99

100
100

101
Example 1 extension Exam1 xExam 2 y Kendall's τ Spearman r s C=50D=5 101

102
102

103
103

104
SAS Code Data exams; Input exam1 exam2; Datalines; … ; Run; Proc corr data=exams kendall; Var exam1 exam2; Run; 104

105
SAS output 105

106
7. Conclusion 106

107
Summary Nonparametric tests are very useful when we don’t know anything about the distributions. Especially when the distribution is not normal, we can’t use T-test, then we have to study the nonparametric methods. Median is a better measurement of central tendency for non-normal population. Sample can be ordinal and sample size is usually small. 107

108
Summary In summary, we have briefly introduced some most common methods in our presentation including: Sign test Wilcoxon rank sum test and signed rank test Kruskal-Wallis Test Friedman Test Spearman’s Rank Correlation Kendall’s Rank Correlation Coefficient 108

109
Questions Q1Q2… 109

110
The End. Thank You ! 110

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google