Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nonparametric Statistical Methods Presented by Guo Cheng, Ning Liu, Faiza Khan, Zhenyu Zhang, Du Huang, Christopher Porcaro, Hongtao Zhao, Wei Huang 1.

Similar presentations


Presentation on theme: "Nonparametric Statistical Methods Presented by Guo Cheng, Ning Liu, Faiza Khan, Zhenyu Zhang, Du Huang, Christopher Porcaro, Hongtao Zhao, Wei Huang 1."— Presentation transcript:

1 Nonparametric Statistical Methods Presented by Guo Cheng, Ning Liu, Faiza Khan, Zhenyu Zhang, Du Huang, Christopher Porcaro, Hongtao Zhao, Wei Huang 1

2 Introduction

3 Definition  Nonparametric methods 1: rank-based methods are used when we have no idea about the population distribution from which the data is sampled.  Used for small sample sizes.  Used when the data are measured on an ordinal scale and only their ranks are meaningful. 3

4 Outline 1. Sign Test 2. Wilcoxon Signed Rank Test 3. Inferences for Two Independent Samples 4. Inferences for Several Independent Samples 5. Friedman Test 6. Spearman’s Rank Correlation 7. Kendall’s Rank Correlation Coefficient 4

5 1.Sign Test 5

6 Parameter of interest: Median Median is used as a parameter because it is a better measure of data as compared to the mean for skewed distributions. 6

7 Hypothesis test H 0 : µ = µ 0 vs H a : µ > µ 0 where µ 0 is a specified value and µ is unknown median 7

8 Testing Procedure Step 1: Given a random sample x 1, x 2, …, x n from a population with unknown median µ, count the number of x i ’s that exceed µ 0. Denote them by s +. s - = n - s + Step 2: Reject H 0 if s + is large or s - is small. 8

9 How to reject H 0 ? To determine how large s + must be in order to reject H 0, we need to find out the distribution of the corresponding random variable S +. X i : random variable corresponding to the observed values x i S - : random variable corresponding to s - 9

10 Distribution of S + and S - 10

11 Calculating P-value 11

12 Rejection criteria 12

13 Large sample z-test 13

14 Confidence Interval 14

15 Example 15

16 SAS code 16 DATA themostat; INPUT temp; datalines; … ; PROC UNIVARIATE DATA=themostat loccount mu0=200; VAR temp; RUN;

17 SAS Output Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode. Range Interquartile Range Tests for Location: Mu0=200 Test -Statistic p Value Student's t t Pr > |t| Sign M 3 Pr >= |M| Signed Rank S 19.5 Pr >= |S|

18 2. Wilcoxon signed rank test 18

19 Inventor Frank Wilcoxon (2 September 1892 in County Cork, Ireland – 18 November 1965, Tallahassee, Florida, USA) was a chemist and statistician, known for development of several statistical tests. 19

20 What is it used for? Two related samples Matched samples Repeated measurements on a single sample

21 Hypothesis 21

22 Testing procedure 22

23 Example 23

24 SAS codes 24 DATA thermo; INPUT temp; datalines; … ; PROC UNIVARIATE DATA=thermo loccount mu0=200; TITLE "Wilcoxon signed rank test the thermostat"; VAR temp; RUN;

25 SAS outputs (selected results) 25 8 Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode. Range Interquartile Range Tests for Location: Mu0=200 Test -Statistic p Value Student's t t Pr > |t| Sign M 3 Pr >= |M| Signed Rank S 19.5 Pr >= |S| 0.048

26 Large sample approximation 26

27 Derive E(x) & Var(x) 27

28 Rejection region: 28

29 3. Inferences for Two Independent Samples 29

30 Hypothesis

31 Definition 31

32 Definition 32

33 Wilcoxon sum rank test 33

34 Mann-Whitney-U test 34

35 Between two tests 35

36 Advantages 36

37 For large samples 37

38 For large samples 38

39 Treatment of ties 39

40 Example To test if the grades of two classes which have the same teacher are the same, we randomly pick 7 students from Class A and 9 from Class B, their scores are as follows A: B:

41 Example BBABBABB BBAAAABA

42 Example 42

43 Example 43

44 SAS code Data exam; Input group $ score Datalines; A 8.50 A 9.48 A 8.65 A 8.16 A 8.83 A 7.76 A 8.63 B 8.27 B 8.20 B 8.25 B 8.14 B 9.00 B 8.10 B 7.20 B 8.32 B 7.70 ; 44

45 SAS code Proc npar1way data=exam wilcoxon; Var score; Class group; Exact wilcoxon; Run; 45

46 Output Wilcoxon Scores (Rank Sums) for Variable score Classified by Variable group groupNSum of Scores Expected Under H0 Std Dev Under H0 Mean Score A B

47 Output Wilcoxon Two-Sample Test Statistic (S) Normal Approximation Z One-Sided Pr > Z Two-Sided Pr > |Z| t Approximation One-Sided Pr > Z Two-Sided Pr > |Z| Exact Test One-Sided Pr >= S Two-Sided Pr >= |S - Mean| Z includes a continuity correction of

48 Output 48

49 4. Inferences for Several Independent Samples 49

50 Introduction We know that if our data is normally distributed and that the population standard deviations are equal, we can test for a difference among several populations by using the One-way ANOVA F test. 50

51 When to use Kruskal-Wallis test? But what happens when our data is not normal? This is when we use the nonparametric Kruskal-Wallis test to compare more than two populations as long as our data comes from a continuous distribution. The notion of the kw rank test is to rank all the data from each group together and then apply one-way ANOVA to the ranks rather than to the original data. 51

52 Kruskal-Wallis Test (kw Test) A non-parametric method for testing whether samples originate from the same distribution. Used for comparing more than two samples that are independent. 52

53 Kruskal-Wallis Test: History William Henry Kruskal October 10 th, 1919 – April 21 st, 2005 Obtained Bachelors and Masters degree in Mathematics at Harvard University and received his Ph. D. from Columbia University in Wilson Allen Wallis November 5 th,1912 – October 12 th, 1998 Undergraduate work at the University of Minnesota and Graduate work at the University of Chicago in

54 Kruskal-Wallis Test: Steps 1. Create Hypothesis: Null Hypothesis (H o ): The samples from populations are identical Alternative Hypothesis (H a ): At least one sample is different 54

55 Kruskal-Wallis Test: Steps 2. Rank all the data. The lowest number gets the lowest rank and so on. Tied data gets the average of the ranks they would have obtained if they weren’t tied. 3. All the ranks of the different samples are added together. Label these sums L 1, L 2, L 3, and L 4. 55

56 Kruskal-Wallis Test: Steps 4. Find Test Statistic: n = total number of observations in all samples L i = total rank of each sample kw = test statistic 5. Reject H o if H is greater than the chi-square table value. 56

57 Kruskal-Wallis Test: Example An experiment was done to compare four different ways of teaching a concept to a class of students. In this experiment, 28 tenth grade classes were randomly assigned to the four methods (7 classes per method). A 45 question test was given to each class. The average test scores of the classes are given in the following table. Apply the Kruskal- Wallis test to the test scores data set. 57

58 Kruskal-Wallis Test: Example Given Data Ranks of Data values 58

59 Kruskal-Wallis Test: Example 59

60 Kruskal-Wallis Test: Example 60

61 SAS Input data test; input methodname $ scores; cards; case case case case Case Case Case Formula Formula Formula Formula Formula Formula Formula Equation  Equation  Equation  Equation  Equaiton  Equation  Equation  Unitary  Unitary  Unitary  Unitary  Unitary  Unitary  Unitary  ;  proc npar1way data=test wilcoxon;  class methodname;  var scores;  run; 61

62 SAS Output Wilcoxon Scores (Rank Sums) for Variable scores Classified by Variable methodname Sum of Expected Std Dev Mean methodname N Scores Under H0 Under H0 Score case formula equation unitary Average scores were used for ties. Kruskal-Wallis Test Chi-Square DF 3 Pr > Chi-Square

63 4. Friedman Test 63

64 Introduction A distribution-free rank-based test for comparing the treatments is known as the Friedman test, named after the Nobel Laureate economist Milton Friedman who proposed it. The Friedman Test is a version of the repeated-Measures ANOVA that can be performed on ordinal(ranked) data. 64

65 Steps in the Friedman test 65

66 Steps in the Friedman test 66

67 Example Now we have 8 treatments separated in 3 blocks, α =

68 Define Null and Alternative Hypothesis H 0 : There is no difference between 8 treatments H a : There exists difference between 8 treatments 68

69 Rank Sum 69

70 Friedman Test 70

71 Conclusion 71

72 5. Spearman’s Rank Correlation Coefficient 72

73 Introduction From Pearson to Spearman Spearman’s Rank Correlation Coefficient Large-Sample Approximation Hypothesis Test Examples 73

74 From Pearson to Spearman Pearson’s Measure only the degree of linear association Based on the assumption of bivariate normally of two variables Spearman’s Take in account only the ranks Measure the degree of monotone association Inferences on the rank correlation coefficients are distribution-free 74

75 From Pearson to Spearman 75

76 From Pearson to Spearman Charles Edward Spearman  As a psychologist ① General factor of intelligence ② the nature and causes of variations in human  As a statistician ① Rank correlation ② two-way analysis Charles Edward Spearman (10 Sept – 17 Sept. 1945) ③ Correlation coefficient 76

77 Spearman’s Rank Correlation Coefficient 77

78 Spearman’s Rank Correlation Coefficient 78

79 Large sample approximation 79

80 Hypothesis testing 80

81 Example Table 5.1 Wine Consumption and Heart Disease Deaths Country Australia2.5211Netherlands Austria3.9167New Zealand Belgium2.9131Norway Canada2.4191Spain6.586 Denmark2.9220Sweden Finland0.8297Switzerland France9.171U.K Iceland0.8211U.S Ireland0.7300W. Germany Italy

82 Example 82

83 Example Table 5.2 Ranks of Wine Consumption and Heart Disease Deaths

84 Example 84

85 Example 85

86 6. Kendall’s Rank Correlation Coefficient 86

87 Kendall’s Tau It is a coefficient use to measure the association between two pairs of ranked data. Named after British statistician Maurice Kendall who developed it in Ranges from -1.0 to 1.0 Tau-a (with no ties) and Tau-b (with ties) 87

88 Formula for Tau-a 88

89 Concordant and Discordant 89

90 Example 1 Kendall’s tau-a Raw data for 11 students in 2 exams: Exam 1Exam

91 Ranks of exam results Exam1 xExam 2 ycd C=50D=5 91

92 Calculation for ṫ 92

93 Steps for calculating ṫ 1.Sort data x in ascending order, pair y ranks with x 2.Count c and d for each y 3.Sum C and D 4.Use formula to calculate ṫ 93

94 Formula for tau-b(with ties) 94

95 Example 2 Kendall’s tau-b Wine Consumption and heart disease deaths data iCountryxiyicd 1Ireland Iceland Norway Finland U.S U.K Sweden Netherlands N. Z Canada Australia Germany Belgium Denmark Austria Switzerland Spain Italy France C=25D=141 95

96 Calculation for tau-b 96

97 Hypothesis Test for τ 97

98 Hypothesis test results 98

99 Hypothesis test results 99

100 100

101 Example 1 extension Exam1 xExam 2 y Kendall's τ Spearman r s C=50D=5 101

102 102

103 103

104 SAS Code Data exams; Input exam1 exam2; Datalines; … ; Run; Proc corr data=exams kendall; Var exam1 exam2; Run; 104

105 SAS output 105

106 7. Conclusion 106

107 Summary Nonparametric tests are very useful when we don’t know anything about the distributions. Especially when the distribution is not normal, we can’t use T-test, then we have to study the nonparametric methods. Median is a better measurement of central tendency for non-normal population. Sample can be ordinal and sample size is usually small. 107

108 Summary In summary, we have briefly introduced some most common methods in our presentation including: Sign test Wilcoxon rank sum test and signed rank test Kruskal-Wallis Test Friedman Test Spearman’s Rank Correlation Kendall’s Rank Correlation Coefficient 108

109 Questions Q1Q2… 109

110 The End. Thank You ! 110


Download ppt "Nonparametric Statistical Methods Presented by Guo Cheng, Ning Liu, Faiza Khan, Zhenyu Zhang, Du Huang, Christopher Porcaro, Hongtao Zhao, Wei Huang 1."

Similar presentations


Ads by Google