Presentation is loading. Please wait.

Presentation is loading. Please wait.

Non Parametric Methods Dr. Mohammed Alahmed 1. Learning Objectives 1.Distinguish Parametric & Nonparametric Test Procedures. 2.Explain commonly used Nonparametric.

Similar presentations


Presentation on theme: "Non Parametric Methods Dr. Mohammed Alahmed 1. Learning Objectives 1.Distinguish Parametric & Nonparametric Test Procedures. 2.Explain commonly used Nonparametric."— Presentation transcript:

1 Non Parametric Methods Dr. Mohammed Alahmed 1

2 Learning Objectives 1.Distinguish Parametric & Nonparametric Test Procedures. 2.Explain commonly used Nonparametric Test Procedures. 3.Perform Hypothesis Tests Using Nonparametric Procedures. Dr. Mohammed Alahmed2

3 Introduction In the previous sections we learned a lot about one-sample, two-sample, paired t- tests, ANOVA, regression. All of these tests had some basic assumptions: 1.the individual samples were approximately normal. 2.the individual samples came from populations with approximately equal variance. 3.we preferred that the individual samples were of a size greater than 30. Methods of estimation and hypothesis testing have been based on these assumptions. Dr. Mohammed Alahmed3

4 These procedures are usually called parametric statistical methods because the parametric form of the distribution is assumed to be known. If these assumptions about the shape of the distribution are not made, and/or if the central-limit theorem also seems inapplicable because of small sample size, then non-parametric statistical methods, which make fewer assumptions about the distributional shape, must be used. Non-parametric tests are typically focused on the median (rather than on the mean) and involve fairly straight- forward procedures like ordering and counting. Most nonparametric methods based on ranks instead of original data. Dr. Mohammed Alahmed4

5 Statistical Testing Test ParametricNon Parametric One Quantitative Response Variable One-Sample t-testSign Test One Quantitative Response Variable – Two Values from Paired Samples Paired Sample t- test Wilcoxon Signed Rank Test One Quantitative Response Variable – One Qualitative Independent Variable with two groups Two-independent Sample t-test Wilcoxon Rank Sum or Mann Whitney Test One Quantitative Response Variable – One Qualitative Independent Variable with three or more groups ANOVAKruskall Wallis Dr. Mohammed Alahmed5

6 The Sign Test The sign test is used to test hypotheses about the median, rather than the mean in the parametric test. Assume the null hypothesis is that the median of the distribution is zero. Tests One Population Median. Let S  = number of values greater than median. If null hypothesis is true, S  should have binomial distribution with success probability 0.5 More precisely, number of positive values should follow a binomial distribution with probability 0.5 When the sample is large, the binomial distribution can be approximated with a normal distribution. Dr. Mohammed Alahmed6

7 Conducting a sign test State the hypotheses: –H 0 : median = m 0 and H 1 : median  m 0 (Two - tailed) H 1 : median > m 0 (Right-tailed) H 1 : median < m 0 (Left-tailed) Convert data to plus (+) and minus (-) signs: –Change all data to + (above m 0 ) or – (below m 0 ) –Any values = m 0 change to 0 Dr. Mohammed Alahmed7

8 Compare the number of + and – signs. (Ignore 0’s.) –If the number of + signs and the number of – signs are approximately equal, the null hypothesis is not likely to be rejected. –If they are not approximately equal, however, it is likely that the null hypothesis will be rejected. Dr. Mohammed Alahmed8

9 Test Statistic: When n ≤ 20, the test statistic is the smaller number (x) of + or – signs. When n > 20, the test statistic is: –where X is the smaller number of + or  signs and n is the sample size, i.e., the total number of + or  signs (zeros excluded). Dr. Mohammed Alahmed9

10 Example Recent studies of the private practices of physicians suggested that the median length of each patient visit was 22 minutes. It is believed that the median visit length in practices is shorter than 22 minutes. A random sample of 20 visits in practices yielded, in order, the following visit lengths: 9.4 13.4 15.6 16.2 16.4 16.8 18.1 18.7 18.9 19.1 19.3 20.1 20.4 21.6 21.9 23.4 23.5 24.8 24.9 26.8 Based on these data, is there sufficient evidence to conclude that the median visit length in practices is shorter than 22 minutes? Dr. Mohammed Alahmed10

11 Solution: We are interested in testing: H 0 : m = 22 vs. H 1 : m < 22. Dr. Mohammed Alahmed11

12 Dr. Mohammed Alahmed12

13 Exact test (binomial): Dr. Mohammed Alahmed13

14 The Wilcoxon Signed-Rank Test Wilcoxon Signed-rank test is another non-parametric test used for paired data, equivalent to the paired t-test. We wish to test the hypothesis that the median of the first sample equals the median of the second. It is nonparametric, because it is based on the ranks of the observations rather than on their actual values, as is the paired t test. Use the Wilcoxon Signed-Rank if the assumption of normality is violated for the paired-t test Dr. Mohammed Alahmed14

15 Procedure The first step in this test is to compute ranks for each observation, as follows: 1.Obtain Difference Scores, d i = x 1i - x 2i, and arrange the differences d i in order of absolute value. 2.Count the number of differences with the same absolute value. 3.Ignore the observations where d i = 0, and rank the remaining observations from 1 for the observation with the lowest absolute value, up to n for the observation with the highest absolute value. 4.If any differences are equal, average their ranks 5.Compute the rank sum R 1 of the positive differences and the rank sum R 2 of the negative differences. 6.Compare the smaller of the two rank sums with the T value, obtained from the Appendix of Wilcoxon T values (Table 11). 7.If n ≥ 16, use normal approximation. Dr. Mohammed Alahmed15

16 Example Patient Hours of sleep Difference Rank Ignoring sign DrugPlacebo 16.15.20.93.5* 27.07.9-0.93.5* 38.23.94.310 47.64.72.97 56.55.31.25 68.45.43.08 76.94.22.76 86.76.10.62 97.43.83.69 105.86.3-0.51 3 rd & 4 th ranks are tied hence averaged R= smaller of R 1 (50.5) and R 2 (4.5) Here R = 4.5 significant at 2% level (see Table 11) indicating the drug (hypnotic) is more effective than placebo. Dr. Mohammed Alahmed16

17 Dr. Mohammed Alahmed17

18 Example Twelve adult males were put on a diet in a weight-reducing plan. Weights were recorded before and after the diet. The data are shown in the table below. Use the Wilcoxon Signed-Rank Test to determine if the plan was successful. Use α=0.05. Before186171177168191172177191170171188187 After188177176169196172165190165180181172 Dr. Mohammed Alahmed18

19 Dr. Mohammed Alahmed19

20 The Wilcoxon Rank-Sum Test The Wilcoxon Rank-Sum Test is a nonparametric analog to the t-test for two independent samples. Here, we do NOT have paired data, but rather n 1 values from group 1 and n 2 values from group 2. We want to test whether the values in the groups are samples from different distributions. Used to determine if two independent samples came from the same or equal populations Dr. Mohammed Alahmed20

21 Procedure Rank the data of both the groups in ascending order. If any values are equal average their ranks. Compute the rank sum R 1 in the first sample (the choice of sample is arbitrary). Compare this sum with the critical ranges given in table 12. Dr. Mohammed Alahmed21

22 Example Non-smokers (n=15) Heavy smokers (n=14) Birth wt (Kg)RankBirth wt (Kg)Rank 3.99273.187 3.79242.845 3.60*182.906 3.73223.2711 3.2183.8526 3.60*183.5214 4.08283.239 3.61202.764 3.83253.60*18 3.31123.7523 4.13293.5916 3.26103.6321 3.54152.382 3.51132.341 2.713 Sum=272Sum=163 * 17, 18 & 19are tied hence the ranks are averaged Dr. Mohammed Alahmed22

23 H 0 : the observations come from the same population Dr. Mohammed Alahmed23

24 H 0 : m 1 = m 2 H 1 : m 1 ≠ m 2 Dr. Mohammed Alahmed24

25 Dr. Mohammed Alahmed Kruskal-Wallis One-Way Analysis of Variance In some instances we want to compare means among more than two samples, but either the underlying distribution is far from being normal or we have ordinal data. In these situations, a non-parametric alternative to the One- way ANOVA is The Kruskal-Wallis Test. H 0 : All k populations have the same median. H 1 : Not all of the k population medians are the same. Like all non-parametric tests, the focus is on ranks, counting and the medians. The hypotheses statements are written as: 25

26 Dr. Mohammed Alahmed The Kruskal-Wallis test To compare the medians of K samples (K > 2) using nonparametric methods, use the following procedure: Pool the observations over all samples, thus constructing a combined sample of size n = Σn i Assign ranks to the individual observations, using the average rank in the case of tied observations. Compute the rank sum R i for each of the k samples. If there are no ties, compute the test statistic 26

27 Dr. Mohammed Alahmed Under the null hypothesis, this has an approximate distribution The approximation is OK when each group contains at least 5 observations For a level α test: Reject H o if W >, otherwise do not reject H o 27

28 Dr. Mohammed Alahmed Example: Depression Does physical exercise alleviate depression? We find some depressed people and check that they are all equivalently depressed to begin with. Then we allocate each person randomly to one of three groups: no exercise; 20 minutes of jogging per day; or 60 minutes of jogging per day. At the end of a month, we ask each participant to rate how depressed they now feel, on a Likert scale that runs from 1 ("totally miserable") through to 100 (ecstatically happy"). The appropriate test here is the Kruskal-Wallis test. We have three separate groups of participants, each of whom gives us a single score on a rating scale. Ratings are examples of an ordinal scale of measurement, and so the data are not suitable for a parametric test. The Kruskal-Wallis test will tell us if the differences between the groups are so large that they are unlikely to have occurred by chance. 28

29 Dr. Mohammed Alahmed NoexerciseNoexercise Jogging for 20 minutes Jogging for 60 minutes 2323225959 2626276 5151393838 4949294949 5858465656 3737486060 2929495656 4656262 Data Rating on depression scale: 29

30 Dr. Mohammed Alahmed30

31 Dr. Mohammed Alahmed31

32 Dr. Mohammed Alahmed H 0 : All populations have the same median. H 1 : Not all of the population medians are the same. Conclusion: Since p-value < α, then reject H 0 Conclusion: Since p-value < α, then reject H 0 32

33 Key Concepts These methods can be used when the data cannot be measured on a quantitative scale, or when The numerical scale of measurement is arbitrarily set by the researcher, or when The parametric assumptions such as normality or constant variance are seriously violated. Dr. Mohammed Alahmed33

34 Hypothesis Testing: Categorical Data Dr. Mohammed Alahmed34

35 Introduction In Chapters 7 and 8, the basic methods of hypothesis testing for continuous data were presented. If the variable under study is not continuous but is instead classified into categories, which may or may not be ordered, then different methods of inference should be used. Dr. Mohammed Alahmed35

36 Categorical data analysis deals with discrete data that can be organized into categories. The data are organized into a contingency table. The  2 distribution is used in categorical data analysis. Dr. Mohammed Alahmed36

37 Independent (Explanatory) Variable is Categorical (Nominal or Ordinal) Dependent (Response) Variable is Categorical (Nominal or Ordinal) Special Cases: –2x2 (Each variable has 2 levels) –Nominal/Nominal –Nominal/Ordinal –Ordinal/Ordinal Dr. Mohammed Alahmed37

38 Contingency Tables Tables representing all combinations of levels of explanatory and response variables Numbers in table represent Counts of the number of cases in each cell Row and column totals are called Marginal counts The contingency table is also known as a crosstabulation, because it counts the cases that fall into each pairing of the table. Dr. Mohammed Alahmed38

39 Chi-Square (χ 2 ) and Frequency Data For chi ‑ square, the data are frequencies rather than numerical scores. Chi Square is used when both variables are measured on a nominal or ordinal scale. It can be applied to interval or ratio data that have been categorized into a small number of groups. It assumes that the observations are randomly sampled from the population. All observations are independent (an individual can appear only once in a table and there are no overlapping categories). It does not make any assumptions about the shape of the distribution nor about the homogeneity of variances. Chi-squared is based upon the differences between observed and expected frequencies Dr. Mohammed Alahmed39

40 Chi-Square Statistic Measures how far the observed values are from the expected values Take sum over all cells in table When is large, there is evidence that H 0 is false. Dr. Mohammed Alahmed40

41 Two non-parametric hypothesis tests using the chi-square statistic: 1.the chi-square test for goodness of fit 2.the chi-square test for independence. Assumptions –Independent observations. –A sample size of at least 10. –Random sampling. –All observations must be used. –For the test to be accurate, the expected frequency should be at least 5. Dr. Mohammed Alahmed41

42 Goodness-of-Fit Test A goodness-of-fit test is an inferential procedure used to determine whether a frequency distribution follows a claimed distribution. The chi-square test for goodness-of- fit is a nonparametric test when we have (nominal or ordinal) data. it uses frequency data from a sample to test hypotheses about the shape or proportions of a population. Dr. Mohammed Alahmed42

43 Each individual in the sample is classified into one category on the scale of measurement. The data, called observed frequencies, simply count how many individuals from the sample are in each category. The hypotheses to these tests are written a little different than we have seen in the past because they are usually written in word. Dr. Mohammed Alahmed43

44 Example (Example 10.40 page 401 in the book) Diastolic blood-pressure measurements were collected at home in a community-wide screening program of 14,736 adults ages 30−69 in East Boston, as part of a nationwide study to detect and treat hypertensive people. The people in the study were each screened in the home, with two measurements taken during one visit. A frequency distribution of the mean diastolic blood pressure is given in the Table in 10-mm Hg intervals. Group (mm Hg)< 5050 –60 –70 -80 –90 –100 –110 -Total Observed Frequency 57330213245844604211965925114736 Dr. Mohammed Alahmed44

45 We would like to assume these measurements came from an underlying normal distribution, so that we can use parametric methods. We want to tes t: –H o : the random variable follows normal distribution –H 1 : the random variable does not follow normal distribution How can the above hypothesis be tested? To test this hypothesis: –Estimate parameters from data. –Compute expected counts. –Compute the test statistic used for contingency tables. –This will now have a chi-squared distribution under the null hypothesis. Dr. Mohammed Alahmed45

46 Dr. Mohammed Alahmed46

47 Enter the expected frequency from Table 10.22 Dr. Mohammed Alahmed47

48 Conclusion: We reject the null hypothesis. Thus the normal model does not provide an adequate fit to the data. Conclusion: We reject the null hypothesis. Thus the normal model does not provide an adequate fit to the data. Dr. Mohammed Alahmed48

49 Test of Independence The chi-square test of independence is probably the most frequently used hypothesis test in the social sciences. The chi-square test of independence is used to determine whether there is association between a row variable and column variable in a contingency table constructed from sample data. Dr. Mohammed Alahmed49

50 Hypothesis: H 0 : The row variable is independent of the column variable. H 1 : The row variable is dependent (related to) the column variable. Test Statistic: Expected Observed Dr. Mohammed Alahmed50

51 Example Smoking Lung cancer Total PositiveNegative Obs.Exp.Obs.Exp. Smoker157.67815.3323 Non smoker512.333224.6737 Total204060 To determine whether there is an association between smoking and lung cancer! Dr. Mohammed Alahmed51

52 Hypothesis: H 0 : No Relationship between smoking and lung cancer H 1 : The two variables are associated. Test statistic: = 17.045 Dr. Mohammed Alahmed52

53  = 0.05 df = (2 - 1)(2 - 1) = 1 Critical Value(s): χ 2 1, α/2 = 3.841 from χ 2 table Test Statistic: Decision: Reject H 0 at  =.05 Conclusion: There is evidence of a relationship between smoking and lung cancer. χ 2 = 17.045 Dr. Mohammed Alahmed53

54 Using SPSS Dr. Mohammed Alahmed54

55 Dr. Mohammed Alahmed Conclusion: Since p-value < α, then reject H 0 Conclusion: Since p-value < α, then reject H 0 55


Download ppt "Non Parametric Methods Dr. Mohammed Alahmed 1. Learning Objectives 1.Distinguish Parametric & Nonparametric Test Procedures. 2.Explain commonly used Nonparametric."

Similar presentations


Ads by Google