Presentation on theme: "KRUSKAL-WALIS ANOVA BY RANK (Nonparametric test) this non - parametric test makes no assumptions about the distribution of the data (e.g., normality) More."— Presentation transcript:
KRUSKAL-WALIS ANOVA BY RANK (Nonparametric test) this non - parametric test makes no assumptions about the distribution of the data (e.g., normality) More than 2 groups – a nonparametric alternative to the one way ANOVA. Median of >3 groups are tested It uses the ranks of the data rather than their raw values to calculate the statistic Definition A non - parametric test (distribution - free) used to compare three or more independent groups of sampled data.
Steps of Kruskal-Walis Test All observations from k samples (k groups) are combined into a single series and arranged in order magnitude from smallest to largest. The observations are then replaced by ranks. The smallest observation is replaced by rank 1, the next to smallest by rank 2 and the largest by rank N. The sum of the ranks in each sample (column) is taken. The Kruskal-Walis Test determines whether these sums of ranks are so disparate that they are not likely to come from same population or not. H value is compared to a table of critical values for U based on the sample size of each group. If H exceeds the critical value for H at some significance level (usually 0.05) it means that there is evidence to reject the null hypothesis in favor of the alternative hypothesis.
The test statistic Where k = number of sample n = number of observations in each group. N = the number of observations in all samples combined. R = sum of ranks in in each group Steps of Kruskal-Walis Test
Example: The effects of 2 drugs on reaction time to a certain stimulus were studied in 3 groups of experimental animals. Group 3 served as a control and other two groups were treated by drug A & B. Group AGroup B Group C Group C Data (in ascending order, from all groups): Rank: Data from 3 independent groups are given in the table
Example: Ranks from 3 groups are given in the table This value is compared to a table of critical values for H based. Table shows that when the n j are 5, 4 and 4, calculated H is more than table value (>10.68) at p< The null hypothesis is rejected at the 0.01 level of significance. Conclusion: We conclude that there is a difference in the average reaction time among the three groups. Group AGroup BGroup C R 1 = 55R 2 = 26R 3 = 10
Adjustment for Tied values Since the values of H is somewhat influenced by ties, H is corrected /adjusted by dividing it by Where T = t 3 – t (when t is the number of tied observations in a group of tied values) and N = number of observation in all k groups together, that is, Therefore H corrected for ties is
Adjustment for Tied values Since there were 2 tied values in our group of ties, we have T = 2 3 – 2 = 6 and. Therefore, Therefore H corrected for ties is
The results of this test indicate that there is a significant difference or not between groups. The Tukey multiple comparisons are used to specify which groups differ. KRUSKAL-WALIS TEST
Spearman Rank Correlation Coefficient The conventional correlation coefficient (Pearson’s correlation coefficient) assumes that two variables being measured jointly follows Normal distribution. Nonparametric measures of correlation (distribution free) is Spearman’s rank correlation coefficient. Spearman’s rank correlation coefficient is calculated from ranks rather than from the original observations. The x and y variables to be correlated are each ranked separately from smallest to biggest and then the rankings are correlated with one another. It is designated by r s.
First rank the values fro low to higher. Find the difference (d i ) between the ranks of X i and the ranks of Y i. Square each d i and find the sum of the squared values (d i 2 ). Compute Steps Of Spearman Rank Correlation Coefficient
For testing significance of r s, compute z value Taking ties into account, correction is necessary. However, unless a substantial number of ties are involved, the resulting modifications are minor and, for all practical purposes, can be ignored. Compare calculated value with table value to find p value. If p<0.05, the relation is significant. Steps of Spearman Rank Correlation Coefficient
Example The following are the numbers of hours which 10 students studied for an examination and the grades which they received: No of hours studiedGrade in exam
Solution: Ranking from low to high of the observations in x and y and dtermine the d and d 2. Rank of x Rank of ydd
Solution: Test of significance of r s The null hypothesis is there is no correlation, H 0 : r s =0 For r s = 0.97 and n = 10, we get z value is Since z = 2.91 exceeds (table value), the null hypothesis must be rejected at 0.01 level of significance, we conclude that there is a relationship between study time and grades of the students.
In epidemiology, odds ratio are often used to measure the strength of the association between a risk factor and the disease. This parameter should only be calculated if the chi-square test indicate that there is a relationship between the exposure and disease. Odds ratio is used in case-control studies in which there is a group of diseased people (cases) and a group of undiseased people (control). The odds is defined as the ratio of the probability of an event (disease) happening over the chance that will no happen. The odds ratio then becomes the ratio of the odds in the exposed group over the odds in the unexposed group. Case control study Disease presentDisease absent Exposedab Unexposedcd Odds of disease in the exposed group is a/b and in the unexposed group is c/d. The odds ratio is then, Odds ratio
Confidence interval for odds ratio is: Odds ratio Example: The following data from a case-control study of myocardial infarction (MI) and alcohol intake Does this suggest that there is an association between disease and alcohol consumption? CI: 1.26 – 4.06 The interval does not include 1. With 95% CI, the true odds ratio lies between 1.26 and Since this interval does not include null value 1, the observed association is statistically significant at 5% level. MI caseControl Alcohol7152 No alcohol2948
The risk of an event is the probability that an event will occur within a stated period of time. It is the incidence rate of disease in the exposed group divided by the incidence of disease in the unexposed group is used only when you determined the number of people developing disease in each group over a period of time, i.e., in cohort or prospective studies. The risk of developing the disease within the follow-up time is a/(a+b) for the exposed population and c/(c+d) for unexposed population. Incidence (risk) in the exposed group: a/(a+b) Risk in the unexposed: c/(c+d) Relative risk = Relative Risk (RR)
Numbers of women in cohort study of serum ferritin and anemia Relative Risk (RR) Anemia at 2 nd survey Not anemia at 2 nd survey Total Serum ferritin <20 g 1 st survey 7815 Serum ferritin >20 g 1 st survey Relative risk = = 7 x 15/(2 x 15) = 3.5. Interpretation: Here the relative risk, RR is 3.5. This is interpreted as a woman is 3.5 times more likely to become anemic if her serum ferritin is below 20 mg/l.