# Nonparametric Statistical Methods Presented by Guo Cheng, Ning Liu, Faiza Khan, Zhenyu Zhang, Du Huang, Christopher Porcaro, Hongtao Zhao, Wei Huang 1.

## Presentation on theme: "Nonparametric Statistical Methods Presented by Guo Cheng, Ning Liu, Faiza Khan, Zhenyu Zhang, Du Huang, Christopher Porcaro, Hongtao Zhao, Wei Huang 1."— Presentation transcript:

Nonparametric Statistical Methods Presented by Guo Cheng, Ning Liu, Faiza Khan, Zhenyu Zhang, Du Huang, Christopher Porcaro, Hongtao Zhao, Wei Huang 1

Introduction

Definition  Nonparametric methods 1: rank-based methods are used when we have no idea about the population distribution from which the data is sampled.  Used for small sample sizes.  Used when the data are measured on an ordinal scale and only their ranks are meaningful. 3

Outline 1. Sign Test 2. Wilcoxon Signed Rank Test 3. Inferences for Two Independent Samples 4. Inferences for Several Independent Samples 5. Friedman Test 6. Spearman’s Rank Correlation 7. Kendall’s Rank Correlation Coefficient 4

1.Sign Test 5

Parameter of interest: Median Median is used as a parameter because it is a better measure of data as compared to the mean for skewed distributions. 6

Hypothesis test H 0 : µ = µ 0 vs H a : µ > µ 0 where µ 0 is a specified value and µ is unknown median 7

Testing Procedure Step 1: Given a random sample x 1, x 2, …, x n from a population with unknown median µ, count the number of x i ’s that exceed µ 0. Denote them by s +. s - = n - s + Step 2: Reject H 0 if s + is large or s - is small. 8

How to reject H 0 ? To determine how large s + must be in order to reject H 0, we need to find out the distribution of the corresponding random variable S +. X i : random variable corresponding to the observed values x i S - : random variable corresponding to s - 9

Distribution of S + and S - 10

Calculating P-value 11

Rejection criteria 12

Large sample z-test 13

Confidence Interval 14

Example 15

SAS code 16 DATA themostat; INPUT temp; datalines; 202.2 203.4 … ; PROC UNIVARIATE DATA=themostat loccount mu0=200; VAR temp; RUN;

SAS Output Basic Statistical Measures Location Variability Mean 201.7700 Std Deviation 2.41019 Median 201.7500 Variance 5.80900 Mode. Range 8.30000 Interquartile Range 2.90000 Tests for Location: Mu0=200 Test -Statistic- -----p Value------ Student's t t 2.322323 Pr > |t| 0.0453 Sign M 3 Pr >= |M| 0.1094 Signed Rank S 19.5 Pr >= |S| 0.048 17

2. Wilcoxon signed rank test 18

Inventor Frank Wilcoxon (2 September 1892 in County Cork, Ireland – 18 November 1965, Tallahassee, Florida, USA) was a chemist and statistician, known for development of several statistical tests. 19

What is it used for? Two related samples Matched samples Repeated measurements on a single sample

Hypothesis 21

Testing procedure 22

Example 23

SAS codes 24 DATA thermo; INPUT temp; datalines; 202.2 203.4 … ; PROC UNIVARIATE DATA=thermo loccount mu0=200; TITLE "Wilcoxon signed rank test the thermostat"; VAR temp; RUN;

SAS outputs (selected results) 25 8 Basic Statistical Measures Location Variability Mean 201.7700 Std Deviation 2.41019 Median 201.7500 Variance 5.80900 Mode. Range 8.30000 Interquartile Range 2.90000 Tests for Location: Mu0=200 Test -Statistic- -----p Value------ Student's t t 2.322323 Pr > |t| 0.0453 Sign M 3 Pr >= |M| 0.1094 Signed Rank S 19.5 Pr >= |S| 0.048

Large sample approximation 26

Derive E(x) & Var(x) 27

Rejection region: 28

3. Inferences for Two Independent Samples 29

Hypothesis

Definition 31

Definition 32

Wilcoxon sum rank test 33

Mann-Whitney-U test 34

Between two tests 35

For large samples 37

For large samples 38

Treatment of ties 39

Example To test if the grades of two classes which have the same teacher are the same, we randomly pick 7 students from Class A and 9 from Class B, their scores are as follows A: 8.50 9.48 8.65 8.16 8.83 7.76 8.63 B: 8.27 8.20 8.25 8.14 9.00 8.10 7.20 8.32 7.70 40

Example 7.207.707.768.108.148.168.208.25 BBABBABB 12345678 8.278.328.508.638.658.839.009.48 BBAAAABA 910111213141516 41

Example 42

Example 43

SAS code Data exam; Input group \$ score @@; Datalines; A 8.50 A 9.48 A 8.65 A 8.16 A 8.83 A 7.76 A 8.63 B 8.27 B 8.20 B 8.25 B 8.14 B 9.00 B 8.10 B 7.20 B 8.32 B 7.70 ; 44

SAS code Proc npar1way data=exam wilcoxon; Var score; Class group; Exact wilcoxon; Run; 45

Output Wilcoxon Scores (Rank Sums) for Variable score Classified by Variable group groupNSum of Scores Expected Under H0 Std Dev Under H0 Mean Score A775.059.509.44722210.714286 B961.076.509.4472226.777778 46

Output Wilcoxon Two-Sample Test Statistic (S)75.0000 Normal Approximation Z1.5878 One-Sided Pr > Z0.0562 Two-Sided Pr > |Z|0.1123 t Approximation One-Sided Pr > Z0.0666 Two-Sided Pr > |Z|0.1332 Exact Test One-Sided Pr >= S0.0571 Two-Sided Pr >= |S - Mean|0.1142 Z includes a continuity correction of 0.5. 47

Output 48

4. Inferences for Several Independent Samples 49

Introduction We know that if our data is normally distributed and that the population standard deviations are equal, we can test for a difference among several populations by using the One-way ANOVA F test. 50

When to use Kruskal-Wallis test? But what happens when our data is not normal? This is when we use the nonparametric Kruskal-Wallis test to compare more than two populations as long as our data comes from a continuous distribution. The notion of the kw rank test is to rank all the data from each group together and then apply one-way ANOVA to the ranks rather than to the original data. 51

Kruskal-Wallis Test (kw Test) A non-parametric method for testing whether samples originate from the same distribution. Used for comparing more than two samples that are independent. 52

Kruskal-Wallis Test: History William Henry Kruskal October 10 th, 1919 – April 21 st, 2005 Obtained Bachelors and Masters degree in Mathematics at Harvard University and received his Ph. D. from Columbia University in 1955. Wilson Allen Wallis November 5 th,1912 – October 12 th, 1998 Undergraduate work at the University of Minnesota and Graduate work at the University of Chicago in 1933. 53

Kruskal-Wallis Test: Steps 1. Create Hypothesis: Null Hypothesis (H o ): The samples from populations are identical Alternative Hypothesis (H a ): At least one sample is different 54

Kruskal-Wallis Test: Steps 2. Rank all the data. The lowest number gets the lowest rank and so on. Tied data gets the average of the ranks they would have obtained if they weren’t tied. 3. All the ranks of the different samples are added together. Label these sums L 1, L 2, L 3, and L 4. 55

Kruskal-Wallis Test: Steps 4. Find Test Statistic: n = total number of observations in all samples L i = total rank of each sample kw = test statistic 5. Reject H o if H is greater than the chi-square table value. 56

Kruskal-Wallis Test: Example An experiment was done to compare four different ways of teaching a concept to a class of students. In this experiment, 28 tenth grade classes were randomly assigned to the four methods (7 classes per method). A 45 question test was given to each class. The average test scores of the classes are given in the following table. Apply the Kruskal- Wallis test to the test scores data set. 57

Kruskal-Wallis Test: Example Given Data Ranks of Data values 58

Kruskal-Wallis Test: Example 59

Kruskal-Wallis Test: Example 60

SAS Input data test; input methodname \$ scores; cards; case 14.59 case 23.44 case 25.43 case 18.15 Case 20.82 Case 14.06 Case 14.26 Formula 20.27 Formula 26.84 Formula 14.71 Formula 22.34 Formula 19.49 Formula 24.92 Formula 20.20 Equation 27.82  Equation 24.92  Equation 28.68  Equation 23.32  Equaiton 32.85  Equation 33.90  Equation 23.42  Unitary 33.16  Unitary 26.93  Unitary 30.43  Unitary 36.43  Unitary 37.04  Unitary 29.76  Unitary 33.88  ;  proc npar1way data=test wilcoxon;  class methodname;  var scores;  run; 61

SAS Output Wilcoxon Scores (Rank Sums) for Variable scores Classified by Variable methodname Sum of Expected Std Dev Mean methodname N Scores Under H0 Under H0 Score case 7 49.00 101.50 18.845498 7.000000 formula 7 66.50 101.50 18.845498 9.500000 equation 7 125.50 101.50 18.845498 17.928571 unitary 7 165.00 101.50 18.845498 23.571429 Average scores were used for ties. Kruskal-Wallis Test Chi-Square 18.1390 DF 3 Pr > Chi-Square 0.0004 62

4. Friedman Test 63

Introduction A distribution-free rank-based test for comparing the treatments is known as the Friedman test, named after the Nobel Laureate economist Milton Friedman who proposed it. The Friedman Test is a version of the repeated-Measures ANOVA that can be performed on ordinal(ranked) data. 64

Steps in the Friedman test 65

Steps in the Friedman test 66

Example Now we have 8 treatments separated in 3 blocks, α = 0.025 67

Define Null and Alternative Hypothesis H 0 : There is no difference between 8 treatments H a : There exists difference between 8 treatments 68

Rank Sum 69

Friedman Test 70

Conclusion 71

5. Spearman’s Rank Correlation Coefficient 72

Introduction From Pearson to Spearman Spearman’s Rank Correlation Coefficient Large-Sample Approximation Hypothesis Test Examples 73

From Pearson to Spearman Pearson’s Measure only the degree of linear association Based on the assumption of bivariate normally of two variables Spearman’s Take in account only the ranks Measure the degree of monotone association Inferences on the rank correlation coefficients are distribution-free 74

From Pearson to Spearman 75

From Pearson to Spearman Charles Edward Spearman  As a psychologist ① General factor of intelligence ② the nature and causes of variations in human  As a statistician ① Rank correlation ② two-way analysis Charles Edward Spearman (10 Sept. 1863 – 17 Sept. 1945) ③ Correlation coefficient 76

Spearman’s Rank Correlation Coefficient 77

Spearman’s Rank Correlation Coefficient 78

Large sample approximation 79

Hypothesis testing 80

Example Table 5.1 Wine Consumption and Heart Disease Deaths Country Australia2.5211Netherlands1.8167 Austria3.9167New Zealand1.9266 Belgium2.9131Norway0.8227 Canada2.4191Spain6.586 Denmark2.9220Sweden1.6207 Finland0.8297Switzerland5.8115 France9.171U.K.1.3285 Iceland0.8211U.S.1.2199 Ireland0.7300W. Germany2.7172 Italy7.9107 81

Example 82

Example Table 5.2 Ranks of Wine Consumption and Heart Disease Deaths 11112.5-1.51186.51.5 2156.58.512916-7.0 313.55-8.513315-12.0 41091.01417215.0 513.514-0.515711-4.0 6318-15.016 412.0 719118.0176 -11.0 8312.5-9.518510-5.0 9119-18.0191284.0 1018315.0 83

Example 84

Example 85

6. Kendall’s Rank Correlation Coefficient 86

Kendall’s Tau It is a coefficient use to measure the association between two pairs of ranked data. Named after British statistician Maurice Kendall who developed it in 1938. Ranges from -1.0 to 1.0 Tau-a (with no ties) and Tau-b (with ties) 87

Formula for Tau-a 88

Concordant and Discordant 89

Example 1 Kendall’s tau-a Raw data for 11 students in 2 exams: Exam 1Exam 2 85 9895 9080 8375 5770 6365 7773 9993 8079 9688 6974 90

Ranks of exam results Exam1 xExam 2 ycd 1 291 2 190 3 380 4 561 5 460 6 741 7 640 8 921 9 820 101101 10C=50D=5 91

Calculation for ṫ 92

Steps for calculating ṫ 1.Sort data x in ascending order, pair y ranks with x 2.Count c and d for each y 3.Sum C and D 4.Use formula to calculate ṫ 93

Formula for tau-b(with ties) 94

Example 2 Kendall’s tau-b Wine Consumption and heart disease deaths data iCountryxiyicd 1Ireland0.7300018 2Iceland0.8211311 2Norway0.8227213 4Finland0.8297015 5U.S.1.219959 6U.K1.3285013 7Sweden1.620739 8Netherlands1.816755 9N. Z1.9266010 Canada2.419127 11Australia2.521117 12Germany2.717216 13Belgium2.913124 14Denmark2.922005 15Austria3.916704 16Switzerland5.811503 17Spain6.58611 18Italy7.910701 19France9.17100 C=25D=141 95

Calculation for tau-b 96

Hypothesis Test for τ 97

Hypothesis test results 98

Hypothesis test results 99

100

Example 1 extension Exam1 xExam 2 y Kendall's τ Spearman r s 129111 219011 338000 456111 546011 674111 764011 892111 982011 10110111 1011 C=50D=5 101

102

103

SAS Code Data exams; Input exam1 exam2; Datalines; 85 98 95 … ; Run; Proc corr data=exams kendall; Var exam1 exam2; Run; 104

SAS output 105

7. Conclusion 106

Summary Nonparametric tests are very useful when we don’t know anything about the distributions. Especially when the distribution is not normal, we can’t use T-test, then we have to study the nonparametric methods. Median is a better measurement of central tendency for non-normal population. Sample can be ordinal and sample size is usually small. 107

Summary In summary, we have briefly introduced some most common methods in our presentation including: Sign test Wilcoxon rank sum test and signed rank test Kruskal-Wallis Test Friedman Test Spearman’s Rank Correlation Kendall’s Rank Correlation Coefficient 108

Questions Q1Q2… 109

The End. Thank You ! 110

Download ppt "Nonparametric Statistical Methods Presented by Guo Cheng, Ning Liu, Faiza Khan, Zhenyu Zhang, Du Huang, Christopher Porcaro, Hongtao Zhao, Wei Huang 1."

Similar presentations