Nonparametric Methods

Slides:



Advertisements
Similar presentations
Prepared by Lloyd R. Jaisingh
Advertisements

Chapter 16 Introduction to Nonparametric Statistics
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Nonparametric Methods Chapter 15.
Ordinal Data. Ordinal Tests Non-parametric tests Non-parametric tests No assumptions about the shape of the distribution No assumptions about the shape.
statistics NONPARAMETRIC TEST
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
© 2003 Pearson Prentice Hall Statistics for Business and Economics Nonparametric Statistics Chapter 14.
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter 14 Analysis of Categorical Data
Statistics 07 Nonparametric Hypothesis Testing. Parametric testing such as Z test, t test and F test is suitable for the test of range variables or ratio.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
15-1 Introduction Most of the hypothesis-testing and confidence interval procedures discussed in previous chapters are based on the assumption that.
Non-parametric statistics
Nonparametric and Resampling Statistics. Wilcoxon Rank-Sum Test To compare two independent samples Null is that the two populations are identical The.
Nonparametrics and goodness of fit Petter Mostad
Chapter 15 Nonparametric Statistics
Non-Parametric Methods Professor of Epidemiology and Biostatistics
Nonparametric or Distribution-free Tests
Review I volunteer in my son’s 2nd grade class on library day. Each kid gets to check out one book. Here are the types of books they picked this week:
AM Recitation 2/10/11.
Non-parametric Dr Azmi Mohd Tamil.
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-1 CHAPTER 17 BIVARIATE STATISTICS: NONPARAMETRIC TESTS.
Chapter 14: Nonparametric Statistics
Hypothesis Testing Charity I. Mulig. Variable A variable is any property or quantity that can take on different values. Variables may take on discrete.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
NONPARAMETRIC STATISTICS
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Special Topics 504: Practical Methods in Analyzing Animal Science Experiments The course is: Designed to help familiarize you with the most common methods.
1 1 Slide © 2005 Thomson/South-Western AK/ECON 3480 M & N WINTER 2006 n Power Point Presentation n Professor Ying Kong School of Analytic Studies and Information.
Chapter 14 Nonparametric Statistics. 2 Introduction: Distribution-Free Tests Distribution-free tests – statistical tests that don’t rely on assumptions.
What are Nonparametric Statistics? In all of the preceding chapters we have focused on testing and estimating parameters associated with distributions.
Copyright © 2012 Pearson Education. Chapter 23 Nonparametric Methods.
Previous Lecture: Categorical Data Methods. Nonparametric Methods This Lecture Judy Zhong Ph.D.
Nonparametric Statistics aka, distribution-free statistics makes no assumption about the underlying distribution, other than that it is continuous the.
Chapter 14 Nonparametric Tests Part III: Additional Hypothesis Tests Renee R. Ha, Ph.D. James C. Ha, Ph.D Integrative Statistics for the Social & Behavioral.
© 2000 Prentice-Hall, Inc. Statistics Nonparametric Statistics Chapter 14.
© Copyright McGraw-Hill CHAPTER 13 Nonparametric Statistics.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics S eventh Edition By Brase and Brase Prepared by: Lynn Smith.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Ordinally Scale Variables
Nonparametric Tests IPS Chapter 15 © 2009 W.H. Freeman and Company.
1 Nonparametric Statistical Techniques Chapter 17.
Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
GG 313 Lecture 9 Nonparametric Tests 9/22/05. If we cannot assume that our data are at least approximately normally distributed - because there are a.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Non – Parametric Test Dr.L.Jeyaseelan Dept. of Biostatistics Christian Medical College Vellore, India.
Angela Hebel Department of Natural Sciences
Ka-fu Wong © 2003 Chap Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Statistics in Applied Science and Technology Chapter14. Nonparametric Methods.
CD-ROM Chap 16-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition CD-ROM Chapter 16 Introduction.
NON-PARAMETRIC STATISTICS
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Analisis Non-Parametrik Antonius NW Pratama MK Metodologi Penelitian Bagian Farmasi Klinik dan Komunitas Fakultas Farmasi Universitas Jember.
Nonparametric Statistics
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Chapter 21prepared by Elizabeth Bauer, Ph.D. 1 Ranking Data –Sometimes your data is ordinal level –We can put people in order and assign them ranks Common.
Nonparametric Tests with Ordinal Data Chapter 18.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Nonparametric Statistics.
Nonparametric statistics. Four levels of measurement Nominal Ordinal Interval Ratio  Nominal: the lowest level  Ordinal  Interval  Ratio: the highest.
1 Nonparametric Statistical Techniques Chapter 18.
Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric methods.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Non-parametric Tests Research II MSW PT Class 8. Key Terms Power of a test refers to the probability of rejecting a false null hypothesis (or detect a.
The Rank-Sum Test Section 15.2.
Presentation transcript:

Nonparametric Methods CHAPTER 14 Nonparametric Methods

LEARNING OBJECTIVES 1. Distinguish between: a.                    parametric and nonparametric methods b.                   rank-sum tests and signed-rank tests c.                    Pearson and Spearman correlation coefficients 2.                   List the advantages and disadvantages of nonparametric methods 3.                   Give the equation for the sum of the first n integers 4.                   List the assumptions necessary to perform hypothesis tests by nonparametric methods 5.                   Be able to apply the sign test to paired data 6.                   Know when and how to use Fisher’s exact test  

RATIONALE FOR NONPARAMETRIC METHODS  A.      Parametric methods – statistical techniques enabling us to determine if there is a significant difference between to sample means with underlying assumptions of normality, homogeneity of variances, and linearity B.       Nonparametric methods 1.        developed for conditions in which assumptions necessary for using parametric methods cannot be made 2.        sometimes called distribution-free method because it is not necessary to assume that the observations are normally distributed 3.        appropriate for dealing with data that are measured on a nominal or ordinal scale and whose distribution is unknown  

ADVANTAGES AND DISADVANTAGES OF NONPARAMETRIC METHODS A.      Nonparametric advantages 1.                   They do not have restrictive assumptions such as normality of the observations. In practice, data are often nonormal or the sample size is not large enough to gain the benefit of the central limit theorem. At most, the distribution should be somewhat symmetrical. 2.                   Computations can be performed speedily and easily – a prime advantage when quick preliminary indication of results is needed 3.                   They are well suited to experiments of surveys that yield outcomes that are difficult to quantify. In such cases, the parametric methods, although statistically more powerful, may yield less reliable results than the nonparametric methods, which tend to be less sensitive to the errors inherent in ordinal measures

ADVANTAGES AND DISADVANTAGES OF NONPARAMETRIC METHODS B.       Nonparametric disadvantages 1.                   They are less efficient (i.e. they require a larger sample size to reject a false hypothesis) than comparable parametric tests. 2.                   Hypotheses tested with nonparametric methods are less specific than those tested comparably with parametric methods 3.                   They do not take advantage of all the special characteristics of a distribution. Consequently, these methods do not fully utilize the information known about the distribution C.       Should be viewed as complementary statistical methods rather than attractive alternatives. An inherent characteristic is that they deal with ranks rather than values of observations.  

Parametric Test Nonparametric Test One Sample One Sample t-test One sample sign test Two independent Two-sample independent Wilcoxson rank-sum test Samples test Mann-Whitney U test Two dependent Two-paired t-test Wilcoxson signed-rank test Samples Sign test Correlation Pearson r Spearman (rho) rank-order correlation Multiple Groups One-way ANOVA Kruskal-Wallis one-way One Factor ANOVA

WILCOXIN RANK-SUM TEST AND MANN-WHITNEY U TEST A.      Both are mathematically equivalent procedures B.       Both are used to test the null hypothesis that there is no difference in the two population distributions C.       Based on ranks from two independent samples, they correspond to the t test for two independent samples, except that no assumptions are necessary regarding normality ore equality or variances D.      They are excellent alternatives to the t test if your data are significantly skewed

WILCOXIN RANK-SUM TEST AND MANN-WHITNEY U TEST E.      Procedure 1.                   Combine the observations from both samples and arrange them in an array from the smallest to the largest 2.                   Assign ranks to each of the observations 3.                   List the ranks from one sample separately from those of the other 4.                   Separately sum the ranks for the first and second samples

WILCOXIN RANK-SUM TEST AND MANN-WHITNEY U TEST F.      Given the hypothesis that the average of the ranks is approximately equal for both samples, the test statistic (the sum of the ranks of the first sample) should not differ significantly from (the expected sum of the ranks).   Expected sum of ranks Standard error, , for s obtained from repeated samples of lists is

Wilcoxon Rank Sum Test For Two Independent Samples: Mothers Bearing Low-Birth-Weight Babies No. X R(Rank 3 5.5* 0 1.5* 4 7.5* 1 3 2 4 Mothers Bearing Normal-Birth-Weight Babies No. X R(Rank) 4 7.5* 5 9 6 10 11 15 7 11 8 12 10 14 9 13 W2 = 91.5 R2 = 11.4 W1 = 28.5 R1 = 4.1

WILCOXIN RANK-SUM TEST AND MANN-WHITNEY U TEST    G.      Regardless of the shape of the population distribution, the sampling distribution for the sum of a subset of ranks is approximately normal.   Test of significance regarding the equality of the distribution H.       Obtaining the expected rank to compute the Z score

WILCOXIN RANK-SUM TEST AND MANN-WHITNEY U TEST    I.       Determining significant difference between observed sum and expected value  

WILCOXON SIGNED RANK TEST   A.      Counterpart to the paired t test for matched observations, we assume that we have a series of pairs of dependent observations B.       We wish tot test the hypothesis that the median of the first sample equals the median of the second – that is, that there is no tendency for the differences between the outcomes before and after some condition to favor either the before or the after condition C.       Procedure is to obtain the differences between individual pairs of observations. Pairs yielding a difference of zero are eliminated from the computation; the sample size is reduced accordingly

WILCOXON SIGNED RANK TEST A.      Performing the test 1.                    rank the absolute differences by assigning ranks of 1 for the smallest to n for the largest 2.                    ties are eliminated from the analysis 3.                    signs of original differences are restored to each rank 4.                    sum of positive ranks, , are obtained and serve as the test statistic 5.                    if the null hypothesis is true, we would expect the sum of positive ranks to equal that of the negative ranks

Wilcoxon Signed Rank-Test Number of Cigarettes Smoked per Day Subject Xb: Before Xa: after d = xa – xb d  rd 1 8 5 -3 3 3(-) 2 13 15 +2 2 2(+) 3 24 11 -13 13 9(-) 4 15 19 +4 4 4(+) 5 7 0 -7 7 7(-) 6 11 12 +1 1 1(+) 7 20 15 -5 5 5(-) 8 22 0 -22 22 10(-) 9 6 0 -6 6 6(-) 10 15 6 -9 9 8(-) 11 20 20 0 - - n(n + 1) 10(11) = = 55 rd = 2 2 rd(+) = W1 = 7 rd 55 rd(-) = W2 = 48 We = = = 27.5 2 2

WILCOXON SIGNED RANK TEST E.      Sum of all ranks is n(n + 1)/2 F.       Under the null hypothesis, we assume that the sum of ranks of the positive d’s is equal to the sum of the ranks of the negative d’s; that is, each will be half of the total sum of ranks  

WILCOXON SIGNED RANK TEST (Data from Table 14.2) G.      Z test for the difference between sums of matched ranks   H.       The test has a power efficiency of 92% as compared with paired t tests, which satisfies the assumption of normality I.       Assumption of normality requires at least eight pairs. For smaller sample size, an exact test is required

KRUSKAL-WALLIS ONE-WAY ANOVA BY RANKS   A.      A nonparametric equivalent of the one-way ANOVA used for making comparisons of more than two groups B.      The groups are independent C.      The populations from which the samples are selected are not normally distributed or the samples do not have equal variances D.      Can also be used when ordered outcomes exist – ordinal data rather than interval or ratio data necessary to use an ANOVA

KRUSKAL-WALLIS ONE-WAY ANOVA BY RANKS  E.                  Procedure 1.                    combine the observations of the various groups 2.                    arrange them in order of magnitude from lowest to highest 3.                    assign ranks to each of the observations and replace them in each of the groups 4.                    original ratio data has therefore been converted into ordinal or ranked data 5.                    ranks are summed in each group and the test statistic, H is computed 6.                    ranks assigned to observations in each of the k groups are added separately to give k rank sums

KRUSKAL-WALLIS ONE-WAY ANOVA BY RANKS F.                  Test statistic equation   In this equation the number of groups = the number of observations in the jth group N = the number of observations in all the groups combined = the sum of the ranks in the jth group

KRUSKAL-WALLIS ONE-WAY ANOVA BY RANKS A B C 4 2 7 9 8 13 3 10 14 1 11 12 5 6 -- R1=22 R2=37 R3=46 A B C 96 68 115 128 124 149 83 132 166 61 135 147 101 109 -- G.                  Calculation for statistic H:

TIED OBSERVATIONS  A.      When two scores are tied, each score is given the mean of ranks for which it is tied. B.       Because H is somewhat influenced by ties, you may wish to correct for ties in computing H   where T is the number of tied observations in a tied group of scores C.       The result of correcting for ties is to increase the value of H and thus make the results more significant than it would be if H remained uncorrected

THE SIGN TEST A.        One of the simplest of statistical test, it focuses on the median rather than the mean as a measure of central tendency B.        Only assumption made in performing the test is that the variables come from a continuous distribution C.        It is called the sign test because we use pluses and minuses as the new data in performing the calculations D.       We illustrate its use with a single sample and a paired sample E.       It is useful when we are not able to use the t test because the assumption of normality has been violated  

SINGLE SAMPLE  A.        We wish to test the that the sample mean is equal to the population median m B.        We assign + to observations that fall above the population median and – to those that fall below C.        A tie is given a 0 and is not counted D.       If the is true – the medians are the same – we expect an equal number: 50% pluses and 50% minuses E.        We can use the binomial distribution to determine if the number of positive signs deviates significantly from some expected number  Page 267 Example 1>>>>>>>>>

PAIRED SAMPLES  A.        The sign test is also suitable for experiments with paired data such as before and after, or treatment and control B.        Only one assumption must be satisfied – the different pairs must be independent; that is, only the direction of change in each pair is recorded as a plus or minus sign C.        An equal number of pluses and minuses if there is no treatment effect D.       The tested by the paired samples sign test is that the median of the observations listed first is the same as that of the observations listed second in each pair  Page 268 Example 2>>>>>>>>

Pascal’s Triangle n Binomial Coefficient Denominator of p 1 2 4 3 8 6 16 5 10 32 15 20 64

Pascal’s Triangle n Binomial Coefficient Denominator of p 1 .50 2 .25 4 3 .125 .375 8

Sign test Matched Pairs (Before and After) Does not require that the underlying population be normally distributed. Based on Median Difference of zero. A binomial distribution.

Partial Binomial Distribution p = 0.50 n Left S p Right S 0 .0000 20 1 .0000 19 2 .0002 18 3 .0013 17 4 .0059 16 5 .0207 15 6 .0577 14 7 .1316 13 8 .2517 12 9 .4119 11 10 .5881 10

SPEARMAN RANK-ORDER CORRELATION COEFFICIENT A.      We obtain perfect correlation if the ranks for variables x and y are equal for each individual B.       Conversely, lack of association is measured by examining the differences in the ordered ranks, C.       The Spearman rank-order correlation coefficient, rs, can be derived from the Pearson correlation coefficient, r D.      Like the Pearson correlation coefficient, the Spearman rank-order correlation coefficient may take on values of –1 to +1 E.       Values close to + or –1 indicate high correlation F.       Values close to zero indicate a lack of association G.       + and – signs indicate whether the correlation coefficient is positive or negative

SPEARMAN RANK-ORDER CORRELATION COEFFICIENT H.      Ties in rank are handled by averaging the ranks   I.       Test statistic with n – 2 df

SPEARMAN RANK-ORDER CORRELATION COEFFICIENT  J.      Use it whenever you are unable to meet the assumptions for Pearson r K.      It is preferable to the Spearman rs because the power of the latter is not as great as that of r L.       The Spearman rs is most appropriate when you have either ordinal data or data that are sufficiently skewed that the Pearson r assumptions are not met  

Patients Ranked by Smoking and Severity of Illness No. of Cigs Smoked/day Severity of Illness Difference In Ranks Patient R1 R2 D(R1-R2) D2 1 2 -1 4 -2 3 9 5 7 6 8 24

RHO rrho = 1 – [(df  D2) / n(n2 – 1)] = .71 t = rho n - 2 / 1-rho2 t = .71 6 / 1-.712 = 1.74 / .7 = 2.49

FISHER’S EXACT TEST A. For use with a small sample size B.        Computes directly the probability of observing a particular set of frequencies in a 2 x 2 table   where a, b, c, and d are the frequencies of a 2 X 2 table and N is the sample size C.        There has been some controversy as to whether it is appropriate to use Fisher’s exact test in the health sciences because the model required that the marginal totals in the 2 x 2 table be fixed – and they seldom are in actual health science settings D.       Some statisticians use this test anyway because the test results tend to give conservative values of P; that is, the true P value is actually less than the computed one

CONCLUSION These are nonparametric methods that correspond to parametric methods such as the t test, paired t test, and correlation coefficient. The primary advantage of these methods is that they do not involve restrictive assumptions such as normality and homogeneity of variance. Their major disadvantage is that they are less efficient than the corresponding parametric methods of five methods described here – the Wilcoxin signed-rank test, Kruskal-Wallis test, the Mann-Whitney U test, the sign test, the Spearman rank-order correlation coefficient, and Fisher’s exact test. These are the nonparametric methods used most frequently in the health sciences.