Presentation is loading. Please wait.

Presentation is loading. Please wait.

Interpreting Kappa in Observational Research: Baserate Matters Cornelia Taylor Bruckner Vanderbilt University.

Similar presentations


Presentation on theme: "Interpreting Kappa in Observational Research: Baserate Matters Cornelia Taylor Bruckner Vanderbilt University."— Presentation transcript:

1 Interpreting Kappa in Observational Research: Baserate Matters Cornelia Taylor Bruckner Vanderbilt University

2 Acknowledgements Paul Yoder Craig Kennedy Niels Waller Andrew Tomarken MRDD training grant KC Quant core

3 Overview Agreement is a proxy for accuracy Agreement statistics 101  Chance agreement  Agreement matrix  Baserate Kappa and baserate, a paradox Estimating accuracy from kappa Applied example

4 Framing as observational coding I will be framing the talk today within observational measurement but the concepts apply to many other situations e.g.,  Agreement between clinicians on diagnosis  Agreement between reporters on child symptoms (e.g. mothers and fathers)

5 “Rater accuracy”: A fictitious session Madeline Scientist writes a script for an interval coded observation session where the  Presence or absence of target behavior in interval Two coders (Eager Beaver and Slack Jack), blind to the script, are asked to code the session. Accuracy of each coder with the script is calculated

6 Accuracy of Eager Beaver (EB) with session (interval data) Occurrence Eager Beaver Nonoccurrence Eager Beaver occurrences True.90.10 nonoccurrence True.01.99

7 Accuracy of Slack Jack (SJ) with session (interval data) occurrence Slack Jack nonoccurrence Slack Jack occurrence True.50 nonoccurrence True.30.70

8 Who has the best accuracy? Eager Beaver of course. Slack Jack was not very accurate Notice that accuracy is about agreement with the occurrence and nonoccurrence of behavior.

9 We don’t always know the truth It is great when we know the true occurrence and nonoccurrence of behaviors But, in the real world we deal with agreement between fallible observers

10 Agreement between raters Point by point interobserver agreement is achieved when independent observers :  see the same thing (behavior, event)  at the same time

11 Difference between agreement and accuracy Agreement can be directly measured. Accuracy can not be directly measured.  We don’t know the “truth” of a session. However, agreement is used as a proxy for accuracy Accuracy can be estimated from agreement  The method for this estimation is the focus of today’s talk

12 Percent agreement  The proportion of intervals that were agreed upon  Agreements/agreements+disagreements  Takes into account occurrence and nonoccurrence agreement  Varies from 0-100%

13 Occurrence and Nonoccurrence agreement Occurrence agreement  The proportion of intervals that either coder recorded the behavior that were agreed upon  Positive agreement Non-occurrence agreement  The proportion of intervals that either coder recorded a nonoccurrence that were agreed upon  Negative agreement

14 Problem with agreement statistics We assume that agreement is due to accuracy Agreement statistics do not control for chance agreement So agreement could be due only to chance

15 Chance agreement and point by point agreement Occurrence agreement Nonoccurrence agreement

16 Agreement matrix Slack Jack Eager Beaver happysadangrypuzzledother happy 60511370 sad 14042047 angry 03120722 puzzled 55430650 other 00011011 7353213419200

17 Using a 2x2 table to check agreement on individual codes When IOA is computed on the total code set it is an omnibus measure of agreement This does not inform us on agreement on any one code. To know agreement on a particular code the confusion matrix needs to be collapsed into a 2x2 matrix.

18 Eager Beaver Slack Jack happysadangrypuzzledother happy 60910070 sad 64001047 angry 07122122 puzzled 043301350 other 10011011 6760163924200

19 Eager Beaver Slack Jackhappy All other emotions happy601070 All other emotions 7123130 67133200

20 Baserate in A 2x2 table 20067 1237All other emotions 701060Happy All other emotions HappySlack Jack Eager Beaver (67+70)/(2*200)=.34

21 Review Defined accuracy Described the relationship between chance agreement and IOA Creating a 2x2 table Calculating a best estimate of the base rate

22 Kappa Kappa is an agreement statistic that controls for chance agreement Before kappa there was a sense that we should control for chance but we did not know how Cohen’s 1960 paper has been cited over 7000 times

23 Definition of Kappa Kappa is the proportion of non-chance agreement observed out of all the non- chance agreement K = P o -P e 1 - P e

24 Definition of Terms P o = The proportion of events for which there is observed agreement.  Same metric as percent agreement P e = The proportion of events for which agreement would be expected by chance alone  Defined as the probability of two raters coding the same behavior at the same time by chance

25 Agreement matrix for EB and SJ with (chance agreement) Happy Eager Beaver All other emotions Eager Beaver Happy Slack jack.36 (.33).36.72 All other emotions Slack Jack.09.18 (.15).28.46.54 Po =.36+.18; Pe =.33 +.15; k = (.54-.48)/(1-.48)=.12

26 What determines the value of kappa Accuracy and base rate Increasing accuracy increases observed agreement therefore: kappa is a consistent estimator of accuracy if base rate is held constant If accuracy is held constant, kappa will decrease as the estimated true base rate deviates from.5

27 Obtained kappa, across baserate, for 80% accuracy Accuracy 80%

28 Obtained kappa, across baserate, for 80% and 99% accuracy Accuracy = 80% Accuracy = 99%

29 Obtained kappa, across baserate, from 80% to 99% accuracy Accuracy=80% Accuracy=85% Accuracy=90% Accuracy=95% Accuracy=99%

30 Bottom line When we observe behaviors that are High or Low baserate our kappa’s will be low. This is important for researchers studying low baserate behaviors  Many of the behaviors we observe in young children with developmental disabilities are very low baserate

31 Criterion values for IOA Cohen never suggested using criterion values for kappa Many professional organizations recommend criterions for IOA e.g., The Council for Exceptional Children: Division for Research Recommendations 2005  “ Data are collected on the reliability or inter-observer agreement (IOA) associated with each dependent variable, and IOA levels meet minimal standards (e.g., IOA = 80%; Kappa =.60)”

32 Criterion accuracy? Setting a criterion for kappa independent of baserate is not useful If we can estimate accuracy  And I am suggesting that we can We need to consider what sufficient accuracy would be

33 Criterion accuracy cont. If we consider 80% agreement sufficient than  Would we consider 80% accuracy sufficient? If we used 80% accuracy as a criterion  Acceptable kappa could be as low as.19 depending on baserate

34 Why it is really important not to use criterion kappas There is a belief that the quality of data will be higher if kappa is higher. This is only true if there is no associated loss of content or construct validity. The processes of collapsing and redefining codes often result in a loss of validity.

35 Applied example See handout for formulas and data

36 Use the table on the first page of your handout to determine the accuracy of raters from baserate and kappa

37 .32.85

38 Recommendations Calculate agreement for each code using a 2x2 table Use the table to determine the accuracy of observers from baserate and obtained kappa Report kappa and accuracy

39 Software to calculate kappa Comkappa, Developed by Bakeman to calculate kappa, SE of kappa, kappa max, and weighted kappa. MOOSES, Developed by Jon Tapp. Calculates kappa on the total code set and individual codes. Can be used with live coding, video coding, and transcription. SPSS

40 Challenge The challenge is to change the standards of observational research that demand kappa's above a criteria of.6  Editors  PI’s  Collaborators


Download ppt "Interpreting Kappa in Observational Research: Baserate Matters Cornelia Taylor Bruckner Vanderbilt University."

Similar presentations


Ads by Google