1 Comparing Diagnostic Accuracies of Two Tests in Studies with Verification Bias Marina Kondratovich, Ph.D. Division of Biostatistics, Center for Devices.

1 Comparing Diagnostic Accuracies of Two Tests in Studies with Verification Bias Marina Kondratovich, Ph.D. Division of Biostatistics, Center for Devices and Radiological Health, U.S. Food and Drug Administration. No official support or endorsement by the Food and Drug Administration of this presentation is intended or should be inferred. September, 2005

2 Outline Introduction: examples, diagnostic accuracy, verification bias I. Ratio of true positive rates and ratio of false positive rates II.Multiple imputation III. Types of missingness in subsets Summary

3 Comparison of two qualitative tests, T 1 and T 2, or combinations of them T1T1 PosNeg T2T2 PosAB NegCD N Examples: Cervical cancer: T 1 - Pap test (categorical values), T 2 - HPV test (qualitative test); Reference method – colposcopy/biopsy Prostate cancer: T 1 - DRE (qualitative test), T 2 - PSA (quantitative test with cutoff of 4 pg/mL); Reference method – biopsy; Abnormal cells on a Pap slide; T 1 - Manual reading of a Pap slide; T 2 - Computer-aided reading of a Pap slide; Reference method – reading of a slide by Adjudication Committee

4 Diagnostic Accuracy of Medical test Se 1- Sp y1y1 x1x1 T1T1 Pair: Sensitivity = TPR Specificity = TNR (x 1, y 1 ), where x 1 = FPR = 1 - Sp 1 y 1 = TPR = Se 1 Pair: PLR 1 = Se 1 /(1-Sp 1 ) = y 1 / x 1 = tangent of θ 1 (slope of line) related to PPV NLR 1 = (1-Se 1 )/Sp 1 = (1-y 1 )/ (1-x 1 ) = tangent of θ 2 (slope of line) related to NPV θ1θ1 θ2θ2

5 Boolean Combinations OR and AND of T 1 and Random Test Se 1- Sp y1y1 x1x1 T1T1 θ1θ1 θ2θ2 y-y 1 = NLR 1 * (x-x 1 ) y-y 1 = (1-y 1 )/(1-x 1 ) * (x-x 1 ) Random Test: + with prob. α - with prob. 1-α Combination OR Se OR = Se 1 + (1-Se 1 )*α = y 1 + (1-y 1 )*α Sp OR = Sp 1 *(1-α) = (1-x 1 )*(1-α) T 1 OR Random Test NLR(T 1 OR Random Test) = (1-y 1 )/(1-x 1 )

6 Boolean Combinations OR and AND of T 1 and Random Test Se 1- Sp y1y1 x1x1 T1T1 θ1θ1 θ2θ2 y-y 1 = PLR 1 * (x-x 1 ) y-y 1 = y 1 /x 1 * (x-x 1 ) Random Test: + with prob. α - with prob. 1-α Combination AND Se AND = Se 1 *α = y 1 *α Sp AND = Sp 1 +(1-Sp 1 )*(1-α) = (1-x 1 ) + x 1 *(1-α) T 1 AND Random Test PLR(T 1 AND Random Test) = y 1 /x 1

7 Comparing Medical Tests Se 1- Sp T1T1 PPV>PPV 1 NPV>NPV 1 PPV<PPV 1 NPV>NPV 1 PPV>PPV 1 NPV<NPV 1 PPV<PPV 1 NPV<NPV 1 More detail in: Biggerstaff, B.J. Comparing diagnostic tests: a simple graphic using likelihood ratios. Statistics in Medicine 2000, 19 :649-663

8 Formal Model: Prospective study, comparison of two qualitative tests,T 1 and T 2, or combinations of them T1T1 PosNeg T2T2 PosAB NegCD N T1T1 PosNeg T2T2 Posa1a1 b1b1 Negc1c1 d1d1 N1N1 T1T1 PosNeg T2T2 Posa0a0 b0b0 Negc0c0 d0d0 N0N0 Disease D+ Non-Disease D- a 1 + a 0 = A; b 1 + b 0 = B; c 1 + c 0 = C; d 1 + d 0 = D, N 1 + N 0 = N

9 Pap test PosNeg T2T2 Pos43285 Neg716,601 7,000 T1T1 PosNeg T2T2 Pos30270300 Neg70 100 Disease D+ Non-Disease D- T1T1 PosNeg T2T2 Pos131528 Neg1 14 Example: condition of interest -cervical disease, T 1 - Pap test, T 2 – biomarker, Reference- colposcopy/biopsy

10 Verification Bias In studies for the evaluation of diagnostic devices, sometimes the reference (gold) standard is not applied to all study subjects. If the process by which subjects were selected for verification depends on the results of the medical tests, then the statistical analysis of accuracies of these medical tests without the proper corrections is biased. This bias is often referred as verification bias (or variants of it, work-up bias, referral bias, and validation bias).

11 Estimates of sensitivities and specificities based only on verified results are biased. T1T1 PosNeg T2T2 PosAB NegCD N T1T1 PosNeg T2T2 Posa0a0 b0b0 Negc0c0 [d 0 ] [N 0 ] Disease D+ Non-Disease D- T1T1 PosNeg T2T2 Posa1a1 b1b1 Negc1c1 [d 1 ] [N 1 ] I.Ratio of True Positive Rates and Ratio of False Positive Rates Not all subjects (or none) with both negative results were verified by the Reference method. Ratio of sensitivities and ratio of false positive rates are unbiased 2. 2 Schatzkin, A., Connor, R.J., Taylor, P.R., and Bunnag, B. Comparing new and old screening tests when a reference procedure cannot be performed on all screeners. American Journal of Epidemiology 1987, Vol. 125, N.4, p.672-678

12 I.Ratio of TP Rates and Ratio of FP Rates (cont.) Statement of the problem: Se 2 /Se 1 = y 2 /y 1 = R y (1-Sp 2 )/(1-Sp 1 ) = x 2 /x 1 = R x Can we make conclusions about effectiveness of Test 2 if we know only ratio of True Positive rates and ratio of False Positive rates between Test 1 and Test 2 ? For sake of simplicity, consider that Test 2 has higher theoretical sensitivity, Se 2 /Se 1 =R y >1 (true parameters not estimates)

13 I.Ratio of TP Rates and Ratio of FP Rates (cont.) Se 1- Sp y1y1 x1x1 T1T1 A) Se 2 /Se 1 =R y >1 (increase in sensitivity) (1-Sp 2 )/(1-Sp 1 ) = R x <1 (decrease in false positive rates) For any Test 1, Test 2 is effective (superior than Test 1 )

14 I.Ratio of TP Rates and Ratio of FP Rates (cont.) Se 1- Sp y1y1 x1x1 T1T1 B) Se 2 /Se 1 =R y >1 (increase in sensitivity); (1-Sp 2 )/(1-Sp 1 ) = R x >1 (increase in false positive rates); R y >= R x > 1 For any Test 1, Test 2 is effective (superior than Test 1 because PPV and NPV of Test 2 are higher than ones of Test 1 ) It is easy to show that PLR 2 =Se 2 /(1-Sp 2 )=R y /R x *PLR 1 and then PLR 2 >= PLR 1

15 I.Ratio of TP Rates and Ratio of FP Rates (cont.) Pap test PosNeg T2T2 Pos43285 Neg716,601 7,000 T1T1 PosNeg T2T2 Pos30270300 Neg70 100 Disease D+ Non-Disease D- T1T1 PosNeg T2T2 Pos131528 Neg1 14 Example: condition of interest -cervical disease, T 1 - Pap test, T 2 – biomarker, Reference- colposcopy/biopsy

16 Se 1- Sp y1y1 x1x1 T1T1 T 1 OR Random Test I.Ratio of TP Rates and Ratio of FP Rates (cont.) C) Se 2 /Se 1 =R y >1 (increase in sensitivity); (1-Sp 2 )/(1-Sp 1 ) = R x >1 (increase in false positive rates); R y < R x Increase in false positive rates is higher than increase in true positive rates Can we make conclusions about effectiveness of Test 2 ?

17 Se 1- Sp y1y1 x1x1 T1T1 T 1 OR Random Test I.Ratio of TP Rates and Ratio of FP Rates (cont.) Theorem: Test 2 is above the line of combination T 1 OR Random Test if (R x -1)/(R y -1) < PLR 1 /NLR 1 Example, R y =2 and R x =3. (R x -1)/(R y -1)=(3-1)/(2-1)=2. Depends on accuracy of T 1 : if PLR 1 /NLR 1 > 2 then T 2 is superior for confirming absence of disease (NPV, PPV); if PLR 1 /NLR 1 < 2 then T2 is inferior overall (NPV, PPV).

18 I.Ratio of TP Rates and Ratio of FP Rates (cont.) C) Se 2 /Se 1 =R y >1 (increase in sensitivity); (1-Sp 2 )/(1-Sp 1 ) = R x >1 (increase in false positive rates); R y < R x (increase in FPR is higher than increase in TPR) For situation C: In order to do conclusions about effectiveness of Test 2, we should have information about the diagnostic accuracy of Test 1.

19 I.Ratio of TP Rates and Ratio of FP Rates (cont.) Se 2 /Se 1 =R y >1 then Se 1 <=1/R y ; (1-Sp 2 )/(1-Sp 1 )=R x >1 then (1-Sp 1 )<=1/R x 1/R y 1/R x Hyperbola If T 1 is in the green area, then T 2 is superior for confirming absence of Disease (higher NPV and lower PPV) If T 1 is in the red area, then T 2 is inferior overall (lower NPV and lower PPV)

20 I.Ratio of TP Rates and Ratio of FP Rates (cont.) Summary: If in the clinical study of comparing accuracies of two tests, Test 2 and Test 1, it is anticipated a statistically higher increase in TP rates of Test 2 than increase in FP rates then conclusions about effectiveness of Test 2 can be made without information about diagnostic accuracy of Test 1. In most practical situations, when it is anticipated that increase in FP rates of Test 2 is higher than increase in TP rates (or not enough sample size to demonstrate that increase in TP is statistically higher than increase in FP), then information about diagnostic accuracy of Test 1 is needed in order to make conclusions about effectiveness of Test 2.

21 If a random sample of the subjects with both negative tests results are verified by reference standard then the unbiased estimates of sensitivities and specificities for Test 1 and Test 2 can be constructed. T1T1 PosNeg T2T2 PosAB NegCD N T1T1 PosNeg T2T2 Posa0a0 b0b0 Negc0c0 [d 0 ] [N 0 ] Disease D+ Non-Disease D- T1T1 PosNeg T2T2 Posa1a1 b1b1 Negc1c1 [d 1 ] [N 1 ] II. Verification Bias: Subjects Negative on Both Tests

22 II. Verification Bias: Bias Correction Verification Bias Correction Procedures: 1.Begg, C.B., Greenes, R.A. (1983) Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics 39, 207-215. 2.Hawkins, D.M., Garrett, J.A., Stephenson, B. (2001) Some issues in resolution of diagnostic tests using an imperfect gold standard. Statistics in Medicine 2001; 20, 1987-2001. Multiple Imputation The absence of the disease status for some subjects can be considered as a problem of missing data. Multiple imputation is a Monte Carlo simulation where the missing disease status of the subjects are replaced by simulated plausible values based on the observed data, each of the imputed datasets is analyzed separately and diagnostic accuracies of tests are evaluated. Then the results are combined to produce the estimates and confidence intervals that incorporate uncertainties related to the missing verified disease status for some subjects.

23 T1T1 PosNeg T2T2 PosAB NegCD N II. Verification Bias: Subjects Negative on Both Tests (cont.) Usually, according to the study protocol, all subjects from the subsets A, B and C should have the verified disease status and the verification bias is related to the subjects to whom both tests results are negative. In practice, sometimes, not all subjects from the subsets A, B, and C may be compliant about disease verification: T1T1 PosNeg T2T2 PosA 70% B 50% NegC 30% D N Verification Bias !

24 III. Different Types of Missingness In order to correctly adjust for verification bias, the type of missingness should be investigated. Missing data mechanisms: Missing Completely At Random (MCAR) – missingness is unrelated to the values of any variables (whether the disease status or observed variables); Missing At Random (MAR) – missingness is unrelated to the disease status but may be related to the observed values of other variables. For details, see Little, R.J.A and Rubin, D. (1987) Statistical Analysis with Missing Data. New York: John Wiley.

25 III. Different Types of Missingness Example: Prospective study for prostate cancer. 5,000 men were screened with digital rectal exam (DRE) and prostate specific antigen (PSA) assay. Results of DRE are Positive, Negative. PSA, a quantitative test, is dichotomized by threshold of 4 ng/ml: Positive (PSA > 4), Negative (PSA 4). D+ = Prostate cancer; D- = No prostate cancer (ref. standard = biopsy). DRE+DRE- PSA+ 150 105 biopsies (70%) 750 375 biopsies (50%) PSA- 250 75 biopsies (30%) 3,850 No biopsies 5,000

26 Subjects with Verified Disease Status DRE+DRE- PSA+60110 PSA-25n/a DRE+DRE- PSA+45265 PSA-50n/a DRE+DRE- PSA+150 105 biopsies (70%) 750 375 biopsies (50%) PSA-250 75 biopsies (30%) 3,850 No biopsies D+ (Positive Biopsy) D- (Negative Biopsy) All Subjects

27 Do the subjects without biopsies differ from the subjects with biopsies? Propensity score = conditional probability that the subject underwent the verification of disease (biopsy in this example) given a collection of observed covariates (the quantitative value of the PSA test, Age, Race and so on). Statistical modeling of relationship between membership in the group of verified subjects by logistic regression: outcome – underwent verification (biopsy): yes, no predictor – PSA Quantitative, covariates. III. Different Types of Missingness (cont.)

28 III. Different Types of Missingness (cont.) DRE+DRE- PSA+150 105 biopsies (70%) 750 375 biopsies (50%) PSA-250 75 biopsies (30%) 3,850 No biopsies 5,000 For subgroup A (PSA+, DRE+), probability that a subject has a missed biopsy does not appear to depend neither on PSA values nor on the observed covariates (age, race). Type of missingness - Missing Completely At Random. Similar, for group B (PSA+, DRE-).

29 III. Different Types of Missingness (cont.) DRE+DRE- PSA+150 105 biopsies (70%) 750 375 biopsies (50%) PSA-250 75 biopsies (30%) 3,850 No biopsies 5,000 For subgroup C (PSA-, DRE+), probability that a subject has a missed biopsy does depend on the quantitative value of PSA. So, the value of the PSA is a significant predictor for biopsy missingness in this subgroup (the larger value of PSA, the lower probability of missing biopsy). Type of missingness - Missing At Random.

30 D+ D- DRE+DRE- PSA+86220 PSA-50 83 (biased) n/a DRE+DRE- PSA+64530 PSA-200 167 (biased) n/a Adjustment for verification without proper investigation of type of missingness (biased estimates): III. Different Types of missingness (cont.) Adjustment for verification taking into account different types of missingness (unbiased estimates):

31 Correct adjustment for verification bias produces the estimates demonstrating that an increase in FP rates for the New test (PSA) is about the same as an increase in TP rates while incorrect adjustment for verification bias showed that the increase in FP rates was larger than the increase in TP rates. So, naïve estimation of the risk for the subgroup C based on the assumption that the missing results of biopsy were Missing Completely At Random produces biased estimation of the performance of the New PSA test (underestimation of the performance of the New test). III. Different Types of missingness (cont.) For proper adjustment, information on the distribution of test results in the subjects who are not selected for verification should be available.

32 Summary In most practical situations, estimation of only ratios of True Positive and False Positive rates does not allow one to make conclusions about effectiveness of the test. The absence of disease status can be considered as the problem of missing data. Multiple imputation technique can be used for correction of verification bias. Information on the distribution of test results in the subjects who are not selected for verification should be available. The investigation of the type of missingness should be done for obtaining unbiased estimates of performances of medical tests. All subsets of subjects should be checked for missing disease status. Precision of the estimated diagnostic accuracies depends primarily on the number of verified cases available for statistical analysis.

33 References 1.Begg C.B. and Greenes R.A. (1983). Assessment of diagnostic tests when disease verification is subject to selection. Biometrics, 39, 207-215. 2. Biggerstaff, B.J. (2000) Comparing diagnostic tests: a simple graphic using likelihood ratios. Statistics in Medicine 2000, 19 :649-663 3.Hawkins, DM, JA Garrett and B Stephenson. (2001) Some issues in resolution of diagnostic tests using an imperfect gold standard. Statistics in Medicine; 20:1987- 2001. 4.Kondratovich MV (2003) Verification bias in the evaluation of diagnostic tests. Proceedings of the 2003 Joint Statistical Meeting, Biopharmaceutical Section, San Francisco, CA. 5. Ransohoff DF, Feinstein AR. (1978) Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. New England Journal Of Medicine. 299: 926-930 6. Schatzkin A., Connor R.J., Taylor P.R., and Bunnag B. (1987) Comparing new and old screening tests when a reference procedure cannot be performed on all screeners. American Journal of Epidemiology, vol.125, N.4, p. 672- 678. 7. Zhou X. (1994) Effect of verification bias on positive and negative predictive values. Statistics in Medicine; 13; 1737-1745 8. Zhou X. (1998) Correcting for verification bias in studies of a diagnostic tests accuracy. Statistical Methods in Medical Research; 7; p.337-353. 9. http://www.fda.gov/cdrh/pdf/p930027s004b.pdfhttp://www.fda.gov/cdrh/pdf/p930027s004b.pdf

1 Comparing Diagnostic Accuracies of Two Tests in Studies with Verification Bias Marina Kondratovich, Ph.D. Division of Biostatistics, Center for Devices.

Similar presentations

Presentation on theme: "1 Comparing Diagnostic Accuracies of Two Tests in Studies with Verification Bias Marina Kondratovich, Ph.D. Division of Biostatistics, Center for Devices."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Comparing Diagnostic Accuracies of Two Tests in Studies with Verification Bias Marina Kondratovich, Ph.D. Division of Biostatistics, Center for Devices.

Similar presentations

Presentation on theme: "1 Comparing Diagnostic Accuracies of Two Tests in Studies with Verification Bias Marina Kondratovich, Ph.D. Division of Biostatistics, Center for Devices."— Presentation transcript:

Similar presentations

About project

Feedback