Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Session 3.3: Inter-rater reliability.

Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Session 3.3: Inter-rater reliability

Aim of co-judge procedure, to discern:  Consistency within coder  Consistency between coders  Take care when making inferences based on little information,  Phenomena impossible to code become missing values Interrater reliability

 Percent agreement: Common but not recommended  Cohen’s kappa coefficient  Kappa is the proportion of the optimum improvement over chance attained by the coders, 1 = perfect agreement, 0 = agreement is no better than that expected by chance, -1 = perfect disagreement  Kappa’s over.40 are considered to be a moderate level of agreement (but no clear basis for this “guideline”)  Correlation between different raters  Intraclass correlation. Agreement among multiple raters corrected for number of raters using Spearman-Brown formula ( r ) Interrater reliability

 Percent exact agreement = Number of observations agreed on Total number of observations Interrater reliability of categorical IV (1) Categorical IV with 3 discreet scale-steps 9 ratings the same % exact agreement = 9/12 =.75

Interrater reliability of categorical IV (2) unweighted Kappa Kappa: Positive values indicate how much the raters agree over and above chance alone Negative values indicate disagreement If agreement matrix is irregular Kappa will not be calculated, or misleading

Interrater reliability of categorical IV (3) unweighted Kappa in SPSS CROSSTABS /TABLES=rater1 BY rater2 /FORMAT= AVALUE TABLES /STATISTIC=KAPPA /CELLS= COUNT /COUNT ROUND CELL.

Interrater reliability of categorical IV (4) Kappas in irregualar matrices If rater 2 is systmatically “above” rater 1 when coding an ordinal scale, Kappa will be misleading  possible to “fill up” with zeros K =.51K = -.16

Interrater reliability of categorical IV (5) Kappas in irregular matrices If there are no observations in some row or column, Kappa will not be calculated  possible to “fill up” with zeros K not possible to estimate K =.47

Interrater reliability of categorical IV (6) weighted Kappa using SAS macro PROC FREQ DATA = int.interrater1 ; TABLES rater1 * rater2 / AGREE; TEST KAPPA; RUN; Papers and macros available for estimating Kappa when unequal or misaligned rows and columns, or multiple raters:

Interrater reliability of continuous IV (1)  Average correlation r = (.873 +.879 +.866) / 3 =.873  Coders code in same direction!

Interrater reliability of continuous IV (2)

Interrater reliability of continuous IV (3)  Design 1 one-way random effects model when each study is rater by a different pair of coders  Design 2 two-way random effects model when a random pair of coders rate all studies  Design 3 two-way mixed effects model ONE pair of coders rate all studies

Comparison of methods (from Orwin, p. 153; in Cooper & Hedges, 1994) Low Kappa but good AR when little variability across items, and coders agree

Interrater reliability in meta-analysis and primary study

 Meta-analysis: coding of independent variables  How many co-judges?  How many objects to co-judge? (sub-sample of studies, versus sub-sample of codings)  Use of “Golden standard” (i.e., one “master-coder”)  Coder drift (cf. observer drift): are coders consistent over time?  Your qualitative analysis is only as good as the quality of your categorisation of qualitative data Interrater reliability in meta-analysis vs. in other contexts

Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Session 3.3: Inter-rater reliability.

Similar presentations

Presentation on theme: "Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Session 3.3: Inter-rater reliability."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Session 3.3: Inter-rater reliability.

Similar presentations

Presentation on theme: "Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Session 3.3: Inter-rater reliability."— Presentation transcript:

Similar presentations

About project

Feedback