Presentation is loading. Please wait.

Presentation is loading. Please wait.

Natalie Robinson Centre for Evidence-based Veterinary Medicine

Similar presentations


Presentation on theme: "Natalie Robinson Centre for Evidence-based Veterinary Medicine"— Presentation transcript:

1 Natalie Robinson Centre for Evidence-based Veterinary Medicine
Measures of agreement Natalie Robinson Centre for Evidence-based Veterinary Medicine

2 Why might we measure agreement?
Measures of reliability Compare 2 or more different methods E.g. SNAP FeLV test vs virus isolation

3 Why might we measure agreement?
To look at inter-rater reliability E.g. several ‘raters’ using the same body condition scoring method on the same animals

4 Why might we measure agreement?
To look at repeatability intra-rater reliability Test-retest reliability E.g. same ‘rater’ using the same BCS method on the same animals on 2 days in a row

5 Categorical/Ordinal data
Binary/nominal/ordinal data Positive or negative test result Breeds of dog Grade of disease (mild, moderate, severe) Percentage agreement Cohen’s Kappa Weighted Kappa (Ordinal) Lots of variations e.g. Fleiss’ Kappa Banerjee and Capozzoli (1999) Beyond kappa: A review of interrater agreement measures. The Canadian Journal of Statistics, 27, 3-23.

6 Percentage agreement 2 different tests performed on 100 samples
Test A +ve -ve 27 2 5 66 Test B

7 So why don’t we just use this…?
Some agreement will occur by chance Depends on the number of categories/frequency of each category For example…

8 Cohen’s Kappa Agreement > expected by chance?
Can only compare two raters/methods at a time Values between 0 and 1 0 = agreement no better than chance 1 = perfect agreement Negative values are possible

9 Getting your data into SPSS
If data is in ‘long form’ (one ‘case’ per row) will need to enter as frequencies instead

10 Getting your data into SPSS
Can do this by producing an ‘n x n’ table were n is the no. of categories In SPSS, select ‘Analyze’ then ‘Descriptive Statistics’ then ‘Crosstabs’

11 Getting your data into SPSS
Select 2 variables you want to compare This will generate an ‘n x n’ table - use to enter frequency data into a new dataset

12 Getting your data into SPSS

13 Getting your data into SPSS
So you dataset should look something like this were the ‘count’ is the frequency from your ‘n x n’ table…

14 What results will I get? Point estimate with standard error
95% confidence intervals +/ (SE) P value – significance but not magnitude Will generally be significant if Kappa >0 unless small sample size

15 What is a ‘good’ K value? Cohen’s Kappa Landis & Koch (1977)
McHugh (2012) Slight None Fair Minimal Moderate Weak Substantial Almost perfect Strong Landis and Koch (1977) The measurement of observer agreement for categorical data. Biometrics, 33: McHugh (2012) Interrater reliability: The Kappa Statistic. Biochem Med (Zagreb), 22:

16 Weighted Kappa Ordinal data
Takes into account intermediate levels of agreement Clinician A Mild Moderate Severe 24 5 2 10 26 8 1 11 13 Clinician B

17 Continuous data Scale/numerical/discrete data e.g. Patient age
Rating on a visual analogue scale

18 Continuous data Need ‘degrees’ of agreement
Incorrect to use e.g. Pearson’s correlation Intraclass correlation Lin’s concordance correlation coefficient Bland-Altman plot Bland JM, Altman DG. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, i,

19 Intraclass correlation
Values 0 -1 0 = no agreement 1 = perfect agreement Use same guidelines as Kappa for interpretation ICC Landis & Koch (1977) McHugh (2012) Slight None Fair Minimal Moderate Weak Substantial Almost perfect Strong

20 Options in SPSS Should I select…
Consistency or absolute agreement? One-way random/Two-way random/Two-way fixed model? May have slightly different terminology in different stats programs This article explains it well…

21 Absolute agreement or consistency?
E.g. Measure 2 always 1 point higher than Measure 1 Consistency would be perfect Absolute agreement would be not

22 One way or two way model? E.g. raters recording no. of cells in sample
Sample No Raters Sample 1 Raters A + B Sample 2 Raters B + C Sample 3 Raters A + C Sample 4 Raters B + D Sample 5 Raters A + D One way = don’t have same raters for all ratees Two way model = do have same raters for all ratees Sample No Raters Sample 1 Raters A, B + C Sample 2 Sample 3 Sample 4 Sample 5

23 Random or mixed model? One way model always random, two way can be random or mixed Random = a random sample of raters from a population of ‘potential raters’ E.g. two examiners marking exam papers These are a ‘sample’ of the population of all possible examiners who could mark the paper

24 Random or mixed model? Mixed = a whole population of raters
i.e. the raters are the only possible raters anyone would be interested in Rare! Usually there will always be another potential rater

25 What will my output look like?
Point estimate/95% confidence interval P value Single measures or average measures?

26 Single or average measures
Single measures = reliability of one rater How accurate would a single person be making measurements on their own? Usually more appropriate: future studies will likely not use multiple raters for each measurement Average measures = reliability of different raters averaged together Will be higher than single measures Not usually justified in using this

27 What to report? Which program used % agreement + Kappa/ICC
Point estimate (95% confidence interval) P value? ICC – type of model selected consistency/absolute agreement “Cohen’s kappa (κ) was calculated for categorical variables such as breed. Intra-class correlation coefficient (ICC) was calculated for age, in a two-way random model with measures of absolute agreement” Robinson et al. (in press) Agreement between veterinary patient data collected from different sources. The Veterinary Journal.

28 Exercises Calculate the Kappa for dog breed data collected from two different sources Calculate the ICC for cat age data collected from two different sources

29 References Landis and Koch (1977) The measurement of observer agreement for categorical data. Biometrics, 33: McHugh (2012) Interrater reliability: The Kappa Statistic. Biochem Med (Zagreb), 22: Bland JM, Altman DG. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, i, Banerjee and Capozzoli (1999) Beyond kappa: A review of interrater agreement measures. The Canadian Journal of Statistics, 27, 3-23. Petrie and Sabin (2009) Medical Statistics at a Glance. 3rd Ed. Robinson et al. (in press) Agreement between veterinary patient data collected from different sources. The Veterinary Journal. Computing ICC in SPSS: Graphpad Kappa/Weight Kappa calculator:


Download ppt "Natalie Robinson Centre for Evidence-based Veterinary Medicine"

Similar presentations


Ads by Google