KRUSKAL-WALIS ANOVA BY RANK (Nonparametric test)

Slides:



Advertisements
Similar presentations
Hypothesis Testing Steps in Hypothesis Testing:
Advertisements

Ordinal Data. Ordinal Tests Non-parametric tests Non-parametric tests No assumptions about the shape of the distribution No assumptions about the shape.
Random error, Confidence intervals and p-values Simon Thornley Simon Thornley.
statistics NONPARAMETRIC TEST
Categorical Data. To identify any association between two categorical data. Example: 1,073 subjects of both genders were recruited for a study where the.
Introduction to Risk Factors & Measures of Effect Meg McCarron, CDC.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 17: Nonparametric Tests & Course Summary.
Today Concepts underlying inferential statistics
The Kruskal-Wallis Test The Kruskal-Wallis test is a nonparametric test that can be used to determine whether three or more independent samples were.
5-3 Inference on the Means of Two Populations, Variances Unknown
1 HYPOTHESIS TESTING: ABOUT MORE THAN TWO (K) INDEPENDENT POPULATIONS.
Chapter 15 Nonparametric Statistics
Choosing Statistical Procedures
Chapter 12: Analysis of Variance
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
AM Recitation 2/10/11.
Hadpop Calculations. Odds ratio What study applicable? Q. It is suggested that obesity increases the chances on an individual becoming infected with erysipelas.
Non-parametric Dr Azmi Mohd Tamil.
 Mean: true average  Median: middle number once ranked  Mode: most repetitive  Range : difference between largest and smallest.
Multiple Choice Questions for discussion
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.2 Estimating Differences.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
14 Elements of Nonparametric Statistics
NONPARAMETRIC STATISTICS
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Statistical Significance R.Raveendran. Heart rate (bpm) Mean ± SEM n In men ± In women ± The difference between means.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Chapter 15 Data Analysis: Testing for Significant Differences.
Hypothesis of Association: Correlation
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
The binomial applied: absolute and relative risks, chi-square.
1 Nonparametric Statistical Techniques Chapter 17.
Nonparametric Statistics
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Chapter 15 – Analysis of Variance Math 22 Introductory Statistics.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
NON-PARAMETRIC STATISTICS
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Lesson Test to See if Samples Come From Same Population.
Chapter 13 Understanding research results: statistical inference.
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 4 Investigating the Difference in Scores.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Nonparametric statistics. Four levels of measurement Nominal Ordinal Interval Ratio  Nominal: the lowest level  Ordinal  Interval  Ratio: the highest.
Case control & cohort studies
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
1 Nonparametric Statistical Techniques Chapter 18.
Inferential Statistics Assoc. Prof. Dr. Şehnaz Şahinkarakaş.
Chapter 10: The t Test For Two Independent Samples.
Methods of Presenting and Interpreting Information Class 9.
Logic of Hypothesis Testing
Epidemiologic Measures of Association
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
P-values.
The binomial applied: absolute and relative risks, chi-square
Non-parametric tests, part A:
Inferences Between Two Variables
Interpreting Epidemiologic Results.
Chapter 18: The Chi-Square Statistic
Spearman’s Rank For relationship data.
Research Techniques Made Simple: Interpreting Measures of Association in Clinical Research Michelle Roberts PhD,1,2 Sepideh Ashrafzadeh,1,2 Maryam Asgari.
Presentation transcript:

KRUSKAL-WALIS ANOVA BY RANK (Nonparametric test) this non-parametric test makes no assumptions about the distribution of the data (e.g., normality) More than 2 groups – a nonparametric alternative to the one way ANOVA. Median of >3 groups are tested It uses the ranks of the data rather than their raw values to calculate the statistic Definition A non-parametric test (distribution-free) used to compare three or more independent groups of sampled data.

Steps of Kruskal-Walis Test All observations from k samples (k groups) are combined into a single series and arranged in order magnitude from smallest to largest. The observations are then replaced by ranks. The smallest observation is replaced by rank 1, the next to smallest by rank 2 and the largest by rank N. The sum of the ranks in each sample (column) is taken. The Kruskal-Walis Test determines whether these sums of ranks are so disparate that they are not likely to come from same population or not. H value is compared to a table of critical values for U based on the sample size of each group. If H exceeds the critical value for H at some significance level (usually 0.05) it means that there is evidence to reject the null hypothesis in favor of the alternative hypothesis.

Steps of Kruskal-Walis Test The test statistic Where k = number of sample n = number of observations in each group. N = the number of observations in all samples combined. R = sum of ranks in in each group

Example: The effects of 2 drugs on reaction time to a certain stimulus were studied in 3 groups of experimental animals. Group 3 served as a control and other two groups were treated by drug A & B. Data from 3 independent groups are given in the table Group A Group B 17 8 20 7 40 9 31 35 Group C 2 5 4 3 Data (in ascending order, from all groups): 2 3 4 5 7 8 8 9 17 20 31 35 40 Rank: 1 2 3 4 5 6.5 6.5 8 9 10 11 12 13

Example: Ranks from 3 groups are given in the table Group A Group B Group C 9 6.5 1 10 5 4 13 8 3 11 2 12 R1 = 55 R2 = 26 R 3= 10 This value is compared to a table of critical values for H based. Table shows that when the nj are 5, 4 and 4, calculated H is more than table value (>10.68) at p<0.009. The null hypothesis is rejected at the 0.01 level of significance. Conclusion: We conclude that there is a difference in the average reaction time among the three groups.

Adjustment for Tied values Since the values of H is somewhat influenced by ties, H is corrected /adjusted by dividing it by Where T = t3 – t (when t is the number of tied observations in a group of tied values) and N = number of observation in all k groups together, that is, Therefore H corrected for ties is

Adjustment for Tied values Since there were 2 tied values in our group of ties, we have T = 23 – 2 = 6 and . Therefore, Therefore H corrected for ties is

KRUSKAL-WALIS TEST The results of this test indicate that there is a significant difference or not between groups. The Tukey multiple comparisons are used to specify which groups differ.

Spearman Rank Correlation Coefficient The conventional correlation coefficient (Pearson’s correlation coefficient) assumes that two variables being measured jointly follows Normal distribution. Nonparametric measures of correlation (distribution free) is Spearman’s rank correlation coefficient. Spearman’s rank correlation coefficient is calculated from ranks rather than from the original observations. The x and y variables to be correlated are each ranked separately from smallest to biggest and then the rankings are correlated with one another. It is designated by rs.

Steps Of Spearman Rank Correlation Coefficient First rank the values fro low to higher. Find the difference (di) between the ranks of Xi and the ranks of Yi. Square each di and find the sum of the squared values (di2). Compute

Steps of Spearman Rank Correlation Coefficient For testing significance of rs, compute z value Taking ties into account, correction is necessary. However, unless a substantial number of ties are involved, the resulting modifications are minor and, for all practical purposes, can be ignored. Compare calculated value with table value to find p value. If p<0.05, the relation is significant.

Example The following are the numbers of hours which 10 students studied for an examination and the grades which they received: No of hours studied Grade in exam 9 56 5 44 11 79 13 72 10 70 5 54 18 94 15 85 2 33 8 65

Solution: Ranking from low to high of the observations in x and y and dtermine the d and d2. Rank of x Rank of y d d2 5 4 1.0 1.00 2.5 2 0.5 0.25 7 8 -1.0 1.00 8 7 1.0 1.00 6 6 0.0 0.00 2.5 3 -0.5 0.25 10 10 0.0 0.00 9 9 0.0 0.00 1 1 0.0 0.00 4 5 -1.0 1.00

For rs = 0.97 and n = 10, we get z value is Solution: Test of significance of rs The null hypothesis is there is no correlation, H0 : rs=0 For rs = 0.97 and n = 10, we get z value is Since z = 2.91 exceeds 2.575 (table value), the null hypothesis must be rejected at 0.01 level of significance, we conclude that there is a relationship between study time and grades of the students.

Odds ratio In epidemiology, odds ratio are often used to measure the strength of the association between a risk factor and the disease. This parameter should only be calculated if the chi-square test indicate that there is a relationship between the exposure and disease. Odds ratio is used in case-control studies in which there is a group of diseased people (cases) and a group of undiseased people (control). The odds is defined as the ratio of the probability of an event (disease) happening over the chance that will no happen. The odds ratio then becomes the ratio of the odds in the exposed group over the odds in the unexposed group. Case control study Disease present Disease absent Exposed a b Unexposed c d Odds of disease in the exposed group is a/b and in the unexposed group is c/d. The odds ratio is then,

Odds ratio MI case Control Alcohol 71 52 No alcohol 29 48 Confidence interval for odds ratio is: Example: The following data from a case-control study of myocardial infarction (MI) and alcohol intake Does this suggest that there is an association between disease and alcohol consumption? MI case Control Alcohol 71 52 No alcohol 29 48 CI: 1.26 – 4.06 The interval does not include 1. With 95% CI, the true odds ratio lies between 1.26 and 4.05. Since this interval does not include null value 1, the observed association is statistically significant at 5% level.

Relative Risk (RR) The risk of an event is the probability that an event will occur within a stated period of time. It is the incidence rate of disease in the exposed group divided by the incidence of disease in the unexposed group is used only when you determined the number of people developing disease in each group over a period of time, i.e., in cohort or prospective studies. The risk of developing the disease within the follow-up time is a/(a+b) for the exposed population and c/(c+d) for unexposed population. Incidence (risk) in the exposed group: a/(a+b) Risk in the unexposed: c/(c+d) Relative risk =

Numbers of women in cohort study of serum ferritin and anemia Relative Risk (RR) Numbers of women in cohort study of serum ferritin and anemia Anemia at 2nd survey Not anemia at 2nd survey Total Serum ferritin <20mg 1st survey 7 8 15 Serum ferritin >20mg 1st survey 2 13 Relative risk = = 7 x 15/(2 x 15) = 3.5. Interpretation: Here the relative risk, RR is 3.5. This is interpreted as a woman is 3.5 times more likely to become anemic if her serum ferritin is below 20 mg/l.

95% Confidence interval for RR is: