Presentation is loading. Please wait.

Presentation is loading. Please wait.

Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Session 3.3: Inter-rater reliability.

Similar presentations


Presentation on theme: "Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Session 3.3: Inter-rater reliability."— Presentation transcript:

1 Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Session 3.3: Inter-rater reliability

2

3 Aim of co-judge procedure, to discern:  Consistency within coder  Consistency between coders  Take care when making inferences based on little information,  Phenomena impossible to code become missing values Interrater reliability

4  Percent agreement: Common but not recommended  Cohen’s kappa coefficient  Kappa is the proportion of the optimum improvement over chance attained by the coders, 1 = perfect agreement, 0 = agreement is no better than that expected by chance, -1 = perfect disagreement  Kappa’s over.40 are considered to be a moderate level of agreement (but no clear basis for this “guideline”)  Correlation between different raters  Intraclass correlation. Agreement among multiple raters corrected for number of raters using Spearman-Brown formula ( r ) Interrater reliability

5  Percent exact agreement = Number of observations agreed on Total number of observations Interrater reliability of categorical IV (1) Categorical IV with 3 discreet scale-steps 9 ratings the same % exact agreement = 9/12 =.75

6 Interrater reliability of categorical IV (2) unweighted Kappa Kappa: Positive values indicate how much the raters agree over and above chance alone Negative values indicate disagreement If agreement matrix is irregular Kappa will not be calculated, or misleading

7 Interrater reliability of categorical IV (3) unweighted Kappa in SPSS CROSSTABS /TABLES=rater1 BY rater2 /FORMAT= AVALUE TABLES /STATISTIC=KAPPA /CELLS= COUNT /COUNT ROUND CELL.

8 Interrater reliability of categorical IV (4) Kappas in irregualar matrices If rater 2 is systmatically “above” rater 1 when coding an ordinal scale, Kappa will be misleading  possible to “fill up” with zeros K =.51K = -.16

9 Interrater reliability of categorical IV (5) Kappas in irregular matrices If there are no observations in some row or column, Kappa will not be calculated  possible to “fill up” with zeros K not possible to estimate K =.47

10 Interrater reliability of categorical IV (6) weighted Kappa using SAS macro PROC FREQ DATA = int.interrater1 ; TABLES rater1 * rater2 / AGREE; TEST KAPPA; RUN; Papers and macros available for estimating Kappa when unequal or misaligned rows and columns, or multiple raters:

11 Interrater reliability of continuous IV (1)  Average correlation r = (.873 +.879 +.866) / 3 =.873  Coders code in same direction!

12 Interrater reliability of continuous IV (2)

13 Interrater reliability of continuous IV (3)  Design 1 one-way random effects model when each study is rater by a different pair of coders  Design 2 two-way random effects model when a random pair of coders rate all studies  Design 3 two-way mixed effects model ONE pair of coders rate all studies

14 Comparison of methods (from Orwin, p. 153; in Cooper & Hedges, 1994) Low Kappa but good AR when little variability across items, and coders agree

15 Interrater reliability in meta-analysis and primary study

16  Meta-analysis: coding of independent variables  How many co-judges?  How many objects to co-judge? (sub-sample of studies, versus sub-sample of codings)  Use of “Golden standard” (i.e., one “master-coder”)  Coder drift (cf. observer drift): are coders consistent over time?  Your qualitative analysis is only as good as the quality of your categorisation of qualitative data Interrater reliability in meta-analysis vs. in other contexts


Download ppt "Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Session 3.3: Inter-rater reliability."

Similar presentations


Ads by Google