Introduction Different situations for agreement Two raters, each with single reading More than two raters, each with single reading More than two raters, each with multiple readings Agreement within a rater Agreement among raters based on means Agreement among raters based on individual readings
Existing Approaches (1) Agreement between two raters, each with single reading Categorical data: Kappa and weighted kappa Continuous data: Concordance Correlation Coefficient (CCC) Intraclass Correlation Coefficient (ICC)
Existing Approaches (2) Agreement among more than two raters, each with single reading Lin (1989): no inference Barnhart, Haber and Song (2001, 2002): GEE King and Chinchilli (2001, 2001): U-statistics Carrasco and Jover (2003): variance components
Existing Approaches (3) Agreement among more than two raters, each with multiple readings Barnhart (2005) Intra-rater/ inter-rater (based on means) /total (based on individual observations) agreement GEE method to model the first and second moments
Unified Approach Agreement among k (k2) raters, with each rater measures each of the n subjects multiple (m) times. Separate intra-rater agreement and inter- rater agreement Measure relative agreement, precision, accuracy, and absolute agreement, Total Deviation Index (TDI) and Coverage Probability (CP)
Unified Approach - summary Using GEE method to estimate all agreement indices and their inferences All agreement indices are expressed as functions of variance components Data: continuous/binary/ordinary Most current popular methods become special cases of this approach
Unified Approach - model Set up subject effect subject by rater effect error effect rater effect
Unified Approach - targets Intra-rater agreement: overall, are k raters consistent with themselves? Inter-rater agreement: Inter-rater agreement (agreement based on mean): overall, are k raters agree with each other based on the average of m readings? Total agreement (agreement based on individual reading): overall, are k raters agree with each other based on individual of the m readings?
Unified Approach – agreement(intra) : for over all k raters, how well is each rater in reproducing his readings?
Unified Approach – precision(intra) and MSD : for any rater j, the proportion of the variance that is attributable to the subjects (same as ) Examine the absolute agreement independent of the total data range:
Unified Approach – TDI(intra) : for each rater j, % of observations are within unit of their replicated readings from the same rater. is the cumulative normal distribution is the absolute value
Unified Approach – CP(intra) : for each rater j, of observations are within unit of their replicated readings from the same rater
Unified Approach – agreement(inter) : for over all k raters, how well are raters in reproducing each others based on the average of the multiple readings?
Unified Approach – precision(inter) : for any two raters, the proportion of the variance that is attributable to the subjects based on the average of the m readings
Unified Approach – accuracy(inter) : how close are the means of different raters:
Unified Approach – TDI(inter) : for overall k raters, % of the average readings are within unit of the replicated averaged readings from the other rater.
Unified Approach – CP(inter) : for each rater j, of averaged readings are within unit of replicated averaged readings from the other rater
Unified Approach – agreement(total) : for over all k raters, how well are raters in reproducing each others based on the individual readings?
Unified Approach – precision(total) : for any two raters, the proportion of the variance that is attributable to the subjects based on the individual readings
Unified Approach – accuracy(total) : how close are the means of different raters (accuracy)
Unified Approach – TDI(total) : for overall k raters, % of the readings are within unit of the replicated readings from the other rater.
Unified Approach – CP(total) : for each rater j, of readings are within unit of replicated readings from the other rater
Unified Approach is the inverse cumulative normal distribution is a central Chi-squre distribution with df=1 StatisticsINTRAINTERTOTALM=1 Agreement Precision Accuracy NA MSD TDIπ CP δ
Estimation and Inference Estimate all means, variance components, and their variances and covariances by GEE method Estimate all indices using above estimates Estimate variances of all indices using above estimates and delta method
Estimation and Inference (2) : the covariance of two replications, and,with coming from rater and coming from rater
Estimation and Inference (3) : the variance from each combination of (i, j), i.e., each cell. Thus is the average of all cells variances.
Estimation and Inference (4) : the variance of replication of rater : the covariance of two replications, and, both of them coming from rater.
Estimation and Inference (5) Using GEE method to estimate all indices through estimating the means and all variance components:
Estimation and Inference (8) is the working variance-covariance structure of, working means assume following normal distribution is the derivative matrix of expectation of with respective to all the parameters
Estimation and Inference (9) GEE method provides: estimates of all means estimates of all variance components estimates of variances for all variance components Estimates of covariances between any two variance components
Estimation and Inference (10) Delta method is used to estimate the variances for all indices
Estimation and Inference (18) Transformations for variances Z-transformation: CCC-indices and precision indices Logit-transformation: accuracy and CP indices Log-transformation: TDI indices
Simulation Study three types of data: binary/ordinary/normal three cases for each type of data k=2, m=1 / k=4, m=1 / k=2, m=3 for each case: 1000 random samples with sample size n=20 for binary and ordinary data: inferences obtained through transformation vs. no- transformation For normal data: transformation
Simulation Study (2) Conclusions: Algorithm works well for three types of data, both in estimates and in inferences For binary and ordinary data: no need for transformation For normal data, Carrasco s method is superior than us, but for categorical data, our is superior. For ordinal data, both Carrasco s method and ours are similar.
Example One Sigma method vs. HemoCue method in measuring the DCHLb level in patients serum 299 samples: each sample collected twice by each method Range: 50-2000 mg/dL
Example One – HemoCue method HemoCue method first readings vs. second readings
Example One – Sigma method Sigma method first readings vs. second readings
Example One – HemoCue vs. Sigma HemoCue s averages vs. Sigma s averages
Example One – analysis result (1) StatisticsEstimates95% CI*Allowance ccc_inter0.98660.98180.9775 ccc_total0.98590.9809 precision_intra0.99860.99820.9943 precision_inter0.98660.9818 precision_total0.98600.9809 accuracy_inter0.99990.9974 accuracy_total0.99990.9974
Example One – analysis result (2) *: for all CCC, precision, accuracy and CP indices, the 95% lower limits are reported. For all TDI indices, the 95% upper limit are reported. StatisticsEstimates95% CI*Allowance TDI intra(0.9) 41.090347.271375 TDI inter(0.9) 127.273149.799150 TDI total(0.9) 130.548152.678 CP intra(75) 0.99730.99420.9 CP inter(150) 0.94750.91700.9 CP intra(150) 0.94120.9102
Example Two Hemagglutinin Inhibition (HAI) assay for antibody to Influenza A (H3N2) in rabbit serum samples from two labs 64 rabbit serum samples: measured twice by each lab Antibody level: negative/positive/highly positive
Example Two – Lab one Second Reading First Reading NegativePositiveHighly positive Negative610 Positive0490 Highly positive 008
Example Two – Lab two Second Reading First Reading NegativePositiveHighly positive Negative200 Positive0222 Highly positive 0533
Example Two: Lab one vs. lab two Lab Two First Reading Lab One First Reading NegativePositiveHighly positive Negative250 Positive01930 Highly positive 008
Example Two: lab one vs. lab two Lab Two Second Reading Lab One Second Reading NegativePositiveHighly positive Negative240 Positive02327 Highly positive 008
Example Two StatisticsEstimates95% CI*Allowance ccc_inter0.372250.220390.4375 ccc_total0.357760.20970 precision_intra0.883610.796920.75 precision_inter0.567950.4359 precision_total0.534890.39999 accuracy_inter0.655430.51586 accuracy_total0.668850.53561
Conclusions (1) When data are continuous and m goes to : agreement indices are the same as that proposed by Barnhart (2005), both in estimates and inferences improvements Precision indices, accuracy indices TDIs and CP Variance components
Conclusions (2) When m=1: agreement index degenerates into OCCC as proposed by King (2002), Carrasco (2003) for continuous data Improvements: For categorical data: –King s method: approximates to kappa and weighted kappa, our estimates (without transformation) are exactly the same as kappa and weighted kappa, both in estimate and in inference. –Our estimates superior to Carrasco s estimates when precision and accuracy are high Covariates adjustment become available
Conclusions (3) When data are continuous, k=2 and m=1: agreement index degenerates to the original CCC by Lin (1989) When data are binary, k=2 and m=1: agreement index degenerates into kappa, both in estimate and inference
Conclusions (4) When data are ordinary, k=2 and m=1: agreement index degenerates into weighted kappa with below weight set, both in estimate and in inference.
Conclusions (5) Unified approach Relative agreement indices: CCC with precision and accuracy – data range Absolute agreement: Total deviation indices and Coverage Probability – normal assumption Link function need more work Require balanced data
References Barkto, John J (1966): The intraclass correlation coefficient as a measure of reliability. Pshchological Reports 19, 3-11. Barnhart, H. X. and Williamson, J. M. (2001). Modeling concordance correlation via GEE to evaluate reproducibility. Biometrics 57, 931-940. Barnhart, H. X. Song, Jingli and Haber, Michael J. (2005): Assessing intra, inter and total agreement with replicated readings. Statistics in Medicine 19: 255-270. Carrasco, J. L. and Jover, L. (2003). Estimating the generalized concordance correlation coefficient through variance components. Biometrics 59, 849-858. Fleiss, J., Cohen, J. and Everitt, B (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin 72, 323-327. King, Tonya S. and Chinchilli, Vernon M. (2001): A generalized concordance correlation coefficient for continuous and categorical data. Statistics in Medicine 20: 2131-2147. Lin, L. I. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics 45, 255-268. Lin, L. I., Hedayat, A. S., Sinha, B., and Yang, M. (2002). Statistical methods in assessing agreement: models, issues & tools. Journal of American Statistical Association 97(457), 257-270. Wu, Wenting. A unified approach for assessing agreement. Ph.D. thesis, UIC, 2006