Natalie Robinson Centre for Evidence-based Veterinary Medicine

Slides:



Advertisements
Similar presentations
Correlation, Reliability and Regression Chapter 7.
Advertisements

SPSS Session 5: Association between Nominal Variables Using Chi-Square Statistic.
The Research Consumer Evaluates Measurement Reliability and Validity
RELIABILITY Reliability refers to the consistency of a test or measurement. Reliability studies Test-retest reliability Equipment and/or procedures Intra-
Correlation Chapter 6. Assumptions for Pearson r X and Y should be interval or ratio. X and Y should be normally distributed. Each X should be independent.
JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.
Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Session 3.3: Inter-rater reliability.
Intermediate methods in observational epidemiology 2008 Quality Assurance and Quality Control.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Categorical Data Analysis: Stratified Analyses, Matching, and Agreement Statistics Biostatistics March 2007 Carla Talarico.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,
Statistical Methods for Multicenter Inter-rater Reliability Study
LIS 570 Summarising and presenting data - Univariate analysis continued Bivariate analysis.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Rater Reliability How Good is Your Coding?. Why Estimate Reliability? Quality of your data Number of coders or raters needed Reviewers/Grant Applications.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder.
Inter-rater reliability in the KPG exams The Writing Production and Mediation Module.
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
PSC 47410: Data Analysis Workshop  What’s the purpose of this exercise?  The workshop’s research questions:  Who supports war in America?  How consistent.
Inter-rater Reliability of Clinical Ratings: A Brief Primer on Kappa Daniel H. Mathalon, Ph.D., M.D. Department of Psychiatry Yale University School of.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Inter-observer variation can be measured in any situation in which two or more independent observers are evaluating the same thing Kappa is intended to.
Retrospective Chart Reviews: How to Review a Review Adam J. Singer, MD Professor and Vice Chairman for Research Department of Emergency Medicine Stony.
OBJECTIVE INTRODUCTION Emergency Medicine Milestones: Longitudinal Interrater Agreement EM milestones were developed by EM experts for the Accreditation.
1 Measuring Agreement. 2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent.
Practical Solutions Comparing Proportions & Analysing Categorical Data.
Meta-analysis Overview
Outline Sampling Measurement Descriptive Statistics:
Clinical practice involves measuring quantities for a variety of purposes, such as: aiding diagnosis, predicting future patient outcomes, serving as endpoints.
Inferential Statistics
Logic of Hypothesis Testing
Measurement Reliability
Determining and Interpreting Associations Among Variables
Research in Social Work Practice Salem State University
Measures of Agreement Dundee Epidemiology and Biostatistics Unit
Reliability and Validity
Assessment Theory and Models Part II
Testing for moderators
Understanding Results
Basic Statistics Overview
Applied Statistical Analysis
Correlation and its Applications
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Social Research Methods
Central tendency and spread
Stats Club Marnie Brennan
Inferential Statistics
NURS 790: Methods for Research and Evidence Based Practice
Stat 217 – Day 28 Review Stat 217.
STEM Fair Graphs.
Testing hypotheses Continuous variables.
ERRORS, CONFOUNDING, and INTERACTION
15.1 Goodness-of-Fit Tests
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE
Zheng Xie, Chai Gadepalli, Barry M.G. Cheetham,
15.1 The Role of Statistics in the Research Process
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Lecture 1: Descriptive Statistics and Exploratory
Testing hypotheses Continuous variables.
Correlational Research
Making Use of Associations Tests
Intermediate methods in observational epidemiology 2008
COMPARING VARIABLES OF ORDINAL OR DICHOTOMOUS SCALES: SPEARMAN RANK- ORDER, POINT-BISERIAL, AND BISERIAL CORRELATIONS.
InferentIal StatIstIcs
Statistics Review (It’s not so scary).
Introduction to Psychometric Analysis of Survey Data
Presentation transcript:

Natalie Robinson Centre for Evidence-based Veterinary Medicine Measures of agreement Natalie Robinson Centre for Evidence-based Veterinary Medicine

Why might we measure agreement? Measures of reliability Compare 2 or more different methods E.g. SNAP FeLV test vs virus isolation

Why might we measure agreement? To look at inter-rater reliability E.g. several ‘raters’ using the same body condition scoring method on the same animals

Why might we measure agreement? To look at repeatability intra-rater reliability Test-retest reliability E.g. same ‘rater’ using the same BCS method on the same animals on 2 days in a row

Categorical/Ordinal data Binary/nominal/ordinal data Positive or negative test result Breeds of dog Grade of disease (mild, moderate, severe) Percentage agreement Cohen’s Kappa Weighted Kappa (Ordinal) Lots of variations e.g. Fleiss’ Kappa Banerjee and Capozzoli (1999) Beyond kappa: A review of interrater agreement measures. The Canadian Journal of Statistics, 27, 3-23.

Percentage agreement 2 different tests performed on 100 samples Test A +ve -ve 27 2 5 66 Test B

So why don’t we just use this…? Some agreement will occur by chance Depends on the number of categories/frequency of each category For example…

Cohen’s Kappa Agreement > expected by chance? Can only compare two raters/methods at a time Values between 0 and 1 0 = agreement no better than chance 1 = perfect agreement Negative values are possible

Getting your data into SPSS If data is in ‘long form’ (one ‘case’ per row) will need to enter as frequencies instead

Getting your data into SPSS Can do this by producing an ‘n x n’ table were n is the no. of categories In SPSS, select ‘Analyze’ then ‘Descriptive Statistics’ then ‘Crosstabs’

Getting your data into SPSS Select 2 variables you want to compare This will generate an ‘n x n’ table - use to enter frequency data into a new dataset

Getting your data into SPSS

Getting your data into SPSS So you dataset should look something like this were the ‘count’ is the frequency from your ‘n x n’ table…

What results will I get? Point estimate with standard error 95% confidence intervals +/- 1.96 (SE) P value – significance but not magnitude Will generally be significant if Kappa >0 unless small sample size

What is a ‘good’ K value? Cohen’s Kappa Landis & Koch (1977) McHugh (2012) 0.00-0.20 Slight None 0.21-0.40 Fair Minimal 0.41-0.60 Moderate Weak 0.61-0.80 Substantial 0.81-0.90 Almost perfect Strong 0.91-1.00 Landis and Koch (1977) The measurement of observer agreement for categorical data. Biometrics, 33: 159-174 McHugh (2012) Interrater reliability: The Kappa Statistic. Biochem Med (Zagreb), 22: 276-282

Weighted Kappa Ordinal data Takes into account intermediate levels of agreement Clinician A Mild Moderate Severe 24 5 2 10 26 8 1 11 13 Clinician B http://graphpad.com/quickcalcs/kappa1.cfm

Continuous data Scale/numerical/discrete data e.g. Patient age Rating on a visual analogue scale

Continuous data Need ‘degrees’ of agreement Incorrect to use e.g. Pearson’s correlation Intraclass correlation Lin’s concordance correlation coefficient Bland-Altman plot Bland JM, Altman DG. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, i, 307-310.

Intraclass correlation Values 0 -1 0 = no agreement 1 = perfect agreement Use same guidelines as Kappa for interpretation ICC Landis & Koch (1977) McHugh (2012) 0.00-0.20 Slight None 0.21-0.40 Fair Minimal 0.41-0.60 Moderate Weak 0.61-0.80 Substantial 0.81-0.90 Almost perfect Strong 0.91-1.00

Options in SPSS Should I select… Consistency or absolute agreement? One-way random/Two-way random/Two-way fixed model? May have slightly different terminology in different stats programs This article explains it well… http://neoacademic.com/2011/11/16/computing-intraclass-correlations-icc-as-estimates-of-interrater-reliability-in-spss

Absolute agreement or consistency? E.g. Measure 2 always 1 point higher than Measure 1 Consistency would be perfect Absolute agreement would be not

One way or two way model? E.g. raters recording no. of cells in sample Sample No Raters Sample 1 Raters A + B Sample 2 Raters B + C Sample 3 Raters A + C Sample 4 Raters B + D Sample 5 Raters A + D One way = don’t have same raters for all ratees Two way model = do have same raters for all ratees Sample No Raters Sample 1 Raters A, B + C Sample 2 Sample 3 Sample 4 Sample 5

Random or mixed model? One way model always random, two way can be random or mixed Random = a random sample of raters from a population of ‘potential raters’ E.g. two examiners marking exam papers These are a ‘sample’ of the population of all possible examiners who could mark the paper

Random or mixed model? Mixed = a whole population of raters i.e. the raters are the only possible raters anyone would be interested in Rare! Usually there will always be another potential rater

What will my output look like? Point estimate/95% confidence interval P value Single measures or average measures?

Single or average measures Single measures = reliability of one rater How accurate would a single person be making measurements on their own? Usually more appropriate: future studies will likely not use multiple raters for each measurement Average measures = reliability of different raters averaged together Will be higher than single measures Not usually justified in using this

What to report? Which program used % agreement + Kappa/ICC Point estimate (95% confidence interval) P value? ICC – type of model selected consistency/absolute agreement “Cohen’s kappa (κ) was calculated for categorical variables such as breed. Intra-class correlation coefficient (ICC) was calculated for age, in a two-way random model with measures of absolute agreement” Robinson et al. (in press) Agreement between veterinary patient data collected from different sources. The Veterinary Journal.

Exercises Calculate the Kappa for dog breed data collected from two different sources Calculate the ICC for cat age data collected from two different sources

References Landis and Koch (1977) The measurement of observer agreement for categorical data. Biometrics, 33: 159-174 McHugh (2012) Interrater reliability: The Kappa Statistic. Biochem Med (Zagreb), 22: 276-282 Bland JM, Altman DG. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, i, 307-310. Banerjee and Capozzoli (1999) Beyond kappa: A review of interrater agreement measures. The Canadian Journal of Statistics, 27, 3-23. Petrie and Sabin (2009) Medical Statistics at a Glance. 3rd Ed. Robinson et al. (in press) Agreement between veterinary patient data collected from different sources. The Veterinary Journal. http://www.sciencedirect.com/science/article/pii/S1090023315001653 Computing ICC in SPSS: http://neoacademic.com/2011/11/16/computing-intraclass-correlations-icc-as-estimates-of-interrater-reliability-in-spss Graphpad Kappa/Weight Kappa calculator: http://graphpad.com/quickcalcs/kappa1.cfm