Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Radiologist Evaluation of Reader Performance and Variability Brandon Gallas FDA/CDRH Office of Science and Engineering Labs Division of Imaging and Applied.

Similar presentations


Presentation on theme: "1 Radiologist Evaluation of Reader Performance and Variability Brandon Gallas FDA/CDRH Office of Science and Engineering Labs Division of Imaging and Applied."— Presentation transcript:

1 1 Radiologist Evaluation of Reader Performance and Variability Brandon Gallas FDA/CDRH Office of Science and Engineering Labs Division of Imaging and Applied Mathematics Pathologist

2 2 Paradox Approve WSI to replace OM. Intended use is BROAD Measurements are FOCUSED Clinical Practice is BROAD Clinical Studies/Trials FOCUSED

3 3 Surgical Pathology What do pathologists do? –Look at macroscopic features –Look at microscopic features –Consider patient chart –Consider prior experience, literature –Report results How can we measure pathologist performance? Not relevant Too complicated Always changing Focus on what imaging shows

4 4 Narrowing the Focus Sign-out diagnosis: Breast Cancer Tumor Grade: Nottingham Score Mitotic Count Mitotic Cell or Normal Cell Sort of ordered: Broader to Narrower Harder to Easier This talk is not about narrowing the focus. This talk is about measuring pathologist performance.

5 5 Outline Measurement Scales for Data Sensitivity/Specificity –Binary, multiple types ROC Sources of Variability: Identify, Reduce, Measure –Pathologists –Cases (Patients, Slides, Regions of Interest=ROI) Study Designs Generalize Truth –Multi-level –Location specific (search and detect)

6 6 Pathologist results Sign-out diagnosis: Breast Cancer Tumor Grade: Nottingham Score Mitotic Count Mitotic Cell or Normal Cell Objective –not an opinion –truth is not known Clinical? Pre-clinical? Pathologist-in-the-loop Compare results to a reference

7 7 Reference Sign-out diagnosis: Breast Cancer Tumor Grade: Nottingham Score Mitotic Count Mitotic Cell or Normal Cell Clinical Outcome –survival –events/symptoms Expert Panel –Optical Microscopy –(Q for today’s FDA panel) Different performance measures for different kinds of results

8 8 Measurement Scales Sign-out diagnosis: Breast Cancer Tumor Grade: Nottingham Score Mitotic Count Mitotic Cell or Normal Cell Nominal Ordinal Quantitative Binning “Stronger” scales leads to “Weaker” scales “Weak” to “Strong” Increased Information Increased Statistical Power

9 9 Measurement Scales Sign-out diagnosis: Breast Cancer Tumor Grade: Nottingham Score Mitotic Count Mitotic Cell or Normal Cell Nominal Ordinal Quantitative → Ordinal CAP checklist bins mitotic counts. Results are “1”, “2”, or “3”. Why? Simplify inputs to Nottingham score Otherwise, not recommended

10 10 Measurement Scales Sign-out diagnosis: Breast Cancer Tumor Grade: Nottingham Score Mitotic Count Mitotic Cell or Normal Cell Nominal→ Ordinal Ordinal Quantitative Could make these “stronger” Split positives Split negatives Levels of confidence Levels of likelihood Side by side comparisons Sort cases

11 11 Nominal Single Pathologist Panel Consensus –Majority –Unanimous Which Cells are Mitotic? Tsuda, Jpn J Cancer Res (2000) NO YES

12 12 Which Cells are Mitotic? Tsuda, Jpn J Cancer Res (2000) NO definitely NO definitely NO borderline NO borderline NO definitely NO borderline NO definitely NO definitely NO definitely NO definitely YES definitely YES definitely YES definitely YES definitely YES definitely YES borderline YES borderline YES definitely YES definitely YES definitely Single Pathologist Ordinal 4 levels Split yesses & no’s Ask at start Revisit cases

13 13 Single Pathologist Split more Rank Sort Compare pairs Ask at start Revisit cases Which Cells are Mitotic? Tsuda, Jpn J Cancer Res (2000) 15 13 4 6 9 6 0 2 28 27 28 26 20 27 28 26

14 14 More levels from a panel? Tsuda, Jpn J Cancer Res (2000) Example is panel data –14 Pathologists –20 images –fully-crossed: every pathologist reads every case Measurement Scale: 3 level ordinal –Mitotic –Unknown –Not Mitotic 2 1 0 I made these assignments 14 MD’s per case: Max 28 Min 0

15 15 Aggregate Panel Results Very High, 27.5 High, 26.5 Low, 6.5 Very Low, 0.5 Threshold Variability Different Thresholds Tsuda, Jpn J Cancer Res (2000) 28 15 13 4 6 9 6 0 2 28 27 28 26 20 27 28 26

16 16 Panel Variability Analysis requested for a PMA R2 ImageChecker® computer aided detection system designed to assist radiologists in the detection of solid pulmonary nodules during review of multidetector CT (MDCT) scans of the chest. Miller, D. P. et al., Medical Imaging 2004: Image Perception, Observer Performance and Technology Assessment (2004). This ref and many others are not focus of today’s panel. Resources for the record.

17 17 How do we evaluate the reporting of clinical findings? The ultimate task in surgical pathology is to Diagnose, Classify Multi-class task built on binary tasks –Edwards et al. IEEE-TMI (2004) –He et al. IEEE-TMI (2006) Fundamental Performance Measures –Sensitivity and Specificity –ROC curve –Best of both worlds, measure both

18 18 Sensitivity Specificity Binary Task Binary reference, Binary Decision FNF 1-TPF Decision Negative Positive TNF Specificity FPF 1-TNF TPF Sensitivity Non- Cancer Normalize by reference Caution, Stat Guidance puts reference in columns Outcome fractions TPF: True Positive Fraction FPF: False Positive Fraction TNF: True Negative Fraction FNF: False Negative Fraction Reference

19 19 Type 1 Sensitivity Type 2 Sensitivity Binary Task Binary reference, Binary Decision Decision Type 1 Type 2 Type 1 Sensitivity Type 2 Sensitivity Type 1 Type 2 Only change in Language Reference

20 20 Multi-class Task Multi-type reference, Multi-type decision How does this data get analyzed? Decision 1 2 3 4 12341234 Reference

21 21 Per type Sensitivity Type 1 Sensitivity Decision 1 2 3 4 12341234 Se Multi-class Task Multi-type reference, Multi-type decision Reference

22 22 Multi-class Task Multi-type reference, Multi-type decision Decision 1 2 3 4 12341234 Se Per type Sensitivity Type 2 Sensitivity Reference

23 23 Multi-class Task Multi-type reference, Multi-type decision Decision 1 2 3 4 12341234 Se Per type Sensitivity Type 3 Sensitivity Reference

24 24 Multi-class Task Multi-type reference, Multi-type decision Decision 1 2 3 4 12341234 Se Per type Sensitivity Type 4 Sensitivity Reference

25 25 Multi-class Task Multi-type reference, Multi-type decision Sample size requirements –proportional to # categories, rare outcomes may need enrichment –increases as account for multiple hypotheses –increased for low and high sensitivities –increases for small comparative effects

26 26 ROC Refresher Metz, C. E. (1978), 'Basic Principles of ROC Analysis', Semin Nucl Med 8(4), 283-298. Wagner, R. F. (2006), 'Toward a Strategy for Consensus Development on a Quantitative Approach to Medical Imaging. Guest Editorial', Acad Radiol 13(2), 137-139. ICRU (2008), 'Report 79: Receiver Operating Characteristic Analysis in Medical Imaging' International Commission of Radiation Units and Measurements, Technical report, International Commission on Radiation Units and Measurements, Bethesda, Md..

27 27 Building ROC curve from Sensitivity and Specificity Decision 1 2 Normal Disease Rank negative calls Rank positive calls Split the cells in two –During first pass –Revisit the cases This data has added two possible thresholds Reference

28 28 Building ROC curve from Sensitivity and Specificity Rank negative calls Rank positive calls Split the cells in two –During first pass –Revisit the cases This data has added two possible thresholds Decision 1.0 1.1 2.0 2.1 Normal Disease Believe disease more likely Reference

29 29 Building ROC curve from Sensitivity and Specificity Decision 1.0 1.1 2.0 2.1 Original New Normal Disease Rank negative calls Rank positive calls Split the cells in two –During first pass –Revisit the cases This data has two added possible thresholds Reference

30 30 Building ROC curve from Sensitivity and Specificity The original operating point (Sp,Se) More aggressive –more positive calls –higher sensitivity –lower specificity Less aggressive –fewer positive calls –lower sensitivity –higher specificity Decision 1.0 1.1 2.0 2.1 Normal Disease Sp Se Original Reference

31 31 Building ROC curve from Sensitivity and Specificity Decision 1.0 1.1 2.0 2.1 Normal Disease The original operating point (Sp,Se) More aggressive (Sp,Se) –more positive calls –higher sensitivity –lower specificity Less aggressive –fewer positive calls –lower sensitivity –higher specificity More aggressive Sp Se Reference

32 32 Building ROC curve from Sensitivity and Specificity Decision 1.0 1.1 2.0 2.1 Normal Disease Less aggressive Sp Se The original operating point (Sp,Se) More aggressive (Sp,Se) –more positive calls –higher sensitivity –lower specificity Less aggressive (Sp,Se) –fewer positive calls –lower sensitivity –higher specificity Reference

33 33 TPF, sensitivity FPF, 1-specificity The original operating point (Sp,Se) More aggressive (Sp,Se) –more positive calls –higher sensitivity –lower specificity Less aggressive (Sp,Se) –fewer positive calls –lower sensitivity –higher specificity Less aggressive (95,50) Original (85,85) More (50, 95) Building ROC curve from Sensitivity and Specificity ROC curve RE: Measurement scale Can you get more resolution? Splitting, Rank Scores, Relative Comparisons

34 34 Threshold TPF, sensitivity FPF, 1-specificity RE: Measurement scale Can you get more resolution? Splitting, Rank Scores, Relative Comparisons The original operating point (Sp,Se) More aggressive (Sp,Se) –more positive calls –higher sensitivity –lower specificity Less aggressive (Sp,Se) –fewer positive calls –lower sensitivity –higher specificity Building ROC curve from Sensitivity and Specificity ROC curve

35 35 Threshold TPF, sensitivity FPF, 1-specificity Summary statistic Common relevant scale for comparing diagnostics Measures ability to separate AUC Area Under the ROC Curve ROC curve

36 36 TPF, sensitivity AUC Measures Ability to Separate Cancer Scores Non-Cancer Scores TPF, sensitivity FPF, 1-specificity ROC curve AUC=0.85

37 37 TPF, sensitivity FPF, 1-specificity AUC=0.98 AUC=0.85 AUC Measures Ability to Separate ROC curve

38 38 TPF, sensitivity FPF, 1-specificity AUC=0.50 AUC=0.98 AUC=0.85 AUC Measures Ability to Separate ROC curve

39 39 TPF, sensitivity FPF, 1-specificity Diagnostic performance -or- Pathologist Skill chance line AUC=0.50 AUC=0.98 AUC=0.85 AUC Measures Ability to Separate ROC curve

40 40 How can you estimate AUC? What is AUC? AUC is Probability that disease score > normal score Compare pairs: success = disease score > normal score All comparisons, N 0 N 1 Doing research here Pair each cancer with a non-cancer, N 1

41 41 CAD V1.5with CADModality B CAD V1without CADModality A 1.0 0.0 ROC curve can clear ambiguous (Sp,Se) comparisons FPF, 1- specificity TPF, sensitivity Sens Spec Modality A: 75% 92% Modality B: 90% 81% Which modality is better?

42 42 1.0 0.0 Modality A Modality B One Scenario: Modality B is better higher Sensitivity at same Specificity higher Specificity at same Sensitivity ROC curve can clear ambiguous (Sp,Se) comparisons FPF, 1- specificity TPF, sensitivity

43 43 1.0 0.0 Modality A Modality B ROC curve can clear ambiguous (Sp,Se) comparisons FPF, 1- specificity TPF, sensitivity Another Scenario: Modality A is better higher Sensitivity at same Specificity higher Specificity at same Sensitivity

44 44 1.0 0.0 ROC curve can clear ambiguous (Sp,Se) comparisons Modality A Modality B FPF, 1- specificity TPF, sensitivity Yet Another Scenario: Modality A = Modality B same Sensitivity at same Specificities same Specificity at same Sensitivities Which operating point is best? Risk-benefit analysis

45 45 Expected Utility Equation Risk-Benefit Analysis Inputs –Sensitivity and Specificity –Prevalence –Utilities/Costs to Outcomes Costs are Abstract, Controversial Wagner, Med Decis Making (2004)

46 46 ROC Criticism: Pathologists must make decisions (cancer or not) Clinical Trial vs. Clinical Practice –Enrichment –Patient care not impacted –Little experience/training with the “new” device –Blinded to other patient info AUC can mitigates these Biases Clinical Trial (Sp,Se) ? predictive of ? Clinical Practice (Sp,Se)

47 47 ROC Criticism: All operating points are not relevant Partial Area under the ROC curve –Not Quite AUC, Not Quite Se/Sp –still an average Focus on a certain region of ROC space –high sensitivity (high risk) –high specificity (screening)

48 48 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 00.20.40.60.81 FPF TPF Interested in high specificity decisions (Specificity > 0.8) Partial Area Under the ROC curve

49 49 Radiologist Variability Example Beam et al., Arch Intern Med 1996 108 US Radiologists 79 mammograms: –34 normal/benign –45 breast cancer Fully-crossed data –Every radiologist read every case

50 50 Measurement Scale BIRADS: Breast Imaging-Reporting and Data System (Ordinal) –1, negative –2, no evidence of malignancy –3, probably benign findings; short-interval follow-up –4, suspicious abnormality; biopsy should be considered –5, high probability of cancer; biopsy recommended Radiologist Variability Example Beam et al., Arch Intern Med 1996

51 51 Measurement Scale BIRADS (Ordinal) –1, negative –2, no evidence of malignancy –3, probably benign findings short-interval follow-up –4, suspicious abnormality biopsy should be considered –5, high probability of cancer –biopsy recommended Beam et al., Arch Intern Med 1996 Threshold Radiologist Variability Example Beam et al., Arch Intern Med 1996

52 52 Radiologist Variability Example Beam et al., Arch Intern Med 1996 Mean(Min, Max) Range Sensitivity N=45 80%(47,100) 53 Specificity N=34 90%(35,99) 63 AUC N normal =34 N cancer =45 85%(74,94) 21

53 53 Decision Thresholds Variable/Random –Within a pathologist (non-reproducible) –Across pathologists Non-random, Systematic, changes b.c. –experience and training –patient risk factors –treatment resources –treatment effectiveness –evolution in clinical practice Can be mitigated by AUC PSA example: FDA approved 4ng/ml NCI FactSheet “Thus, there is no specific normal or abnormal PSA level. “ Now throw in the pathologist.

54 54 MRMC Variance Analysis Multi-Reader, Multi-Case Components-of-variance Helps separate sources of variability case reader-case internal noise, reader jitter reader case = patient, tumor, block, section, region of interest (ROI) reader = pathologist

55 55 Fully-Crossed Study Design M.D. # 1 M.D. # 2 M.D. # 3 M.D. # 4 M.D. # 5 M.D. # 6 Data Readers Cases: Patients, Tumors, Blocks,... Most statistical power for a given number of pathologists and a given number of cases (with verified truth) Least demanding of these resources

56 56 Doctor-Patient Study Design M.D. # 1 M.D. # 2 M.D. # 3 M.D. # 4 M.D. # 5 M.D. # 6 Readers No Data No Data Need more readers and cases for same errorbars of fully-crossed design Can do MRMC Variance Analysis. Equation framework same (R, C, RC). Mild changes interpreting terms Cases: Patients, Tumors, Blocks,...

57 57 General Study Design M.D. # 1 M.D. # 2 M.D. # 3 M.D. # 4 M.D. # 5 M.D. # 6 Readers No Data No Data Anywhere between extremes: 1. Fully-crossed 2. *General* 3. Doctor Patient Cases: Patients, Tumors, Blocks,...

58 58 Comparing Two Modalities Pairing Pathologists Builds Correlations Pairing Cases Builds Correlations These correlations add power to comparisons

59 59 Supporting Data –Hendrick, R.; Lewin, J.; D'Orsi, C. & et al. (2001), IWDM 2000. 44 cancers in total of 625 women –prospective with enrichment 5 M.D.s –MQSA-qualified physicians BIRADS scoring Fully-crossed Readers and Cases paired across Film and Digital modalities First Digital Mammography PMA

60 60 MRMC ROC Results: –Mean ROC Area Film= 0.77 –Mean ROC Area Digital = 0.76 –Mean difference= 0.01 –95% C.I. on difference= +/- 0.064 First Digital Mammography PMA This raw data gives the components of variance to size a new trial. Think: Pilot study to size a pivotal trial. Are errorbars small enough?

61 61 Say goal for 95% C.I. is +/- 0.03 –200 cancers, 6 M.D.s –100 cancers, 20 M.D.s –78 cancers, 100 M.D.s –can’t get there with 5 readers –can’t get there with 44 cancers Pilot Study to Pivotal Trial Mammography is very difficult

62 62 MRMC Literature Fully-Crossed Data jackknife/ANOVA (2 way: R,C) –Dorfman, Berbaum, and Metz, Invest Radiol (1992) –Obuchowski and Rockette, Commun Stat B-Simul (1995) The Bootstrap –Beiden, Wagner, and Campbell, Acad Radiol (2000) U-statistics –Gallas, Acad Radiol (2006) Framework unifying all the above –Gallas, Commun Stat A-Theor (2009) Ordinal Regression –Toledano and Gatsonis, Stat Med (1996) Bayesian –Johnson & Johnson, Stat Med (2006)

63 63 MRMC Literature Doctor Patient and General Study Designs U-statistics –Binary Data: Gallas et al., J Opt Soc Am A (2007). –AUC: Gallas & Brown, Neural Networks (2008). Balance case loads Some minimum number for each reader

64 64 Multi-Level Truth Generalizing ROC, Concordance

65 65 Aggregate Panel Results Multi-Level Truth Tsuda, Jpn J Cancer Res (2000) 15 13 4 6 9 6 0 2 28 27 28 26 20 27 28 26

66 66 Multi-Level Truth Generalizing ROC, Concordance Concordance is a Probability –Consider Pathologist rank for a pair of cases –Compare to the panel rank –If order is same for both → Concordance –Harrell et al. JAMA (1982). –Pencina & Augustino, Stat Med (2004). –Obuchowski, Stat Med (2006). MRMC tools for AUC generalize

67 67 Multi-Level Truth Generalizing ROC, Concordance What to do about ties? –Collect finer data –Collect relative rankings Kendall’s Tau b –Reference and Pathologist ties treated the same –Kendall, Rank Correlation Methods (1962) Prediction Probability: –Smith, Stat Med (1996) –Ties in reference not considered –AUC is special case

68 68 Biological Variability Tumors, Blocks, Sections Jannink et al., Histopathology (1996). Same Block

69 69 Biological Variability Regions of Interest, ROI Tsuda Jpn J Cancer Res (2000)

70 70 Variability from reading different Regions of Interest (ROIs) Gal Int J Surg Pathol (2005). More reproducible User friendly Faster No worry about -- in-between spaces -- double counting -- averaging over 10 HPF -- different sizes for HPF -- magnification

71 71 Variability from reading different Regions of Interest (ROIs) Different ROIs have different biology Evaluate the same biology to reduce/eliminate this variability Clinical Study: –Small specimens (ROIs) –Grids or box outlines on cover slip –On-slide annotations with pen –Even easier with digital system

72 72 Multiple ROIs Per case More data... for free... maybe Diversity in cases Account for possible correlations within a case Clustered Data analysis –ANOVA –Survey Sampling: Rao and Scott, Biometrics (1992) –GEE: Smith and Hadgu, Stat Med (1992) –U-statistics: Obuchowski, Biometrics (1997)

73 73 Location-Specific ROC LROC: Localization Response ROC –one “target” per image or not –score each case for disease –mark the most suspicious location –both must happen for a true positive –same axes as ROC FROC: Free Response ROC –multiple “targets” per image or none –mark and score suspicious locations –1-specificity replaced by false positives per case Swensson, Med Phys (1996). Edwards et al., Med Phys (2002). Chakraborty and Berbaum, Med Phys (2004). Popescu, Med Phys (2007). Gur et al. Acad Radiol (2008).

74 74 Summary Best of both worlds –(Sp,Se) and the whole ROC curve Utilize “Stronger” measurement scales –Quantitative > Ordinal > Nominal Pathologist variability exists –reduce and measure Pilot Study

75 75 References Review –Metz, C. E. (1978), 'Basic Principles of ROC Analysis', Semin Nucl Med 8(4), 283-298. –Wagner, R. F.; Metz, C. E. & Campbell, G. (2007), 'Assessment of Medical Imaging Systems and Computer Aids: A Tutorial Review', Acad Radiol 14(6), 723-748. –ICRU (2008), 'Report 79: Receiver Operating Characteristic Analysis in Medical Imaging', International Commission of Radiation Units and Measurements, Technical report, International Commission on Radiation Units and Measurements, Bethesda, Md.. Measurement Scales –Conover W. J. (1971), Practical Nonparametric Statistics, Wiley. Multi-class ROC –Edwards, D. C.; Metz, C. E. & Kupinski, M. A. (2004), 'Ideal Observers and Optimal ROC Hypersurfaces in N-Class Classification', IEEE T Med Imaging 23(7), 891-895. –He, X. & Frey, E. C. (2006), 'Three-Class ROC Analysis - The Equal Error Utility Assumption and the Optimality of Three-Class ROC Surface Using the Ideal Observer', IEEE Trans Med Imaging 25(8), 979-986. Counting Mitoses –Tsuda, H.; Akiyama, F.; Kurosumi, M.; Sakamoto, G.; Yamashiro, K.; Oyama, T.; Hasebe, T.; Kameyama, K.; Hasegawa, T.; Umemura, S.; Honma, K.; Ozawa, T.; Sasaki, K.; Morino, H. & Ohsumi, S. (2000), 'Evaluation of the Interobserver Agreement in the Number of Mitotic Figures of Breast Carcinoma as Simulation of Quality Monitoring in the Japan National Surgical Adjuvant Study of Breast Cancer (NSAS-BC) Protocol', Jpn J Cancer Res 91, 451-457. –Jannink, I.; Risberg, B.; Diest, P. J. V. & Baak, J. P. (1996), 'Heterogeneity of mitotic activity in breast cancer.', Histopathology 29(5), 421--428. –Tsuda, H.; Akiyama, F.; Kurosumi, M.; Sakamoto, G.; Yamashiro, K.; Oyama, T.; Hasebe, T.; Kameyama, K.; Hasegawa, T.; Umemura, S.; Honma, K.; Ozawa, T.; Sasaki, K.; Morino, H. & Ohsumi, S. (2000), 'Evaluation of the Interobserver Agreement in the Number of Mitotic Figures of Breast Carcinoma as Simulation of Quality Monitoring in the Japan National Surgical Adjuvant Study of Breast Cancer (NSAS-BC) Protocol', Jpn J Cancer Res 91, 451-457. –Gal, R.; Rath-Wolfson, L.; Rosenblatt, Y.; Halpern, M.; Schwartz, A. & Korea, R. (2005), 'An Improved Technique for Mitosis Counting', Int J Surg Pathol 13(2), 161-165. Cost/Benefit Analysis –Wagner, R. F.; Beam, C. A. & Beiden, S. V. (2004), 'Reader Variability in Mammography and its Implications for Expected Utility over the Population of Readers and Cases', Med Decis Making 24(6), 561-572.

76 76 References Reader Variability –Miller, D. P.; O'Shaughnessy, K. F.; Wood, S. A. & Castellino, R. A. (2004), “Gold Standards and Expert Panels: A Pulmonary Nodule Case Study with Challenges and Solutions,” Medical Imaging 2004: Image Perception, Observer Performance and Technology Assessment, eds. Dev P. Chakraborty & Miguel P. Eckstein, pp. 173-184. –Beam, C.; Layde, P. & Sullivan, D. (1996), 'Variability in the Interpretation of Screening Mammograms by US Radiologists', Arch Intern Med 156(2), 209-213. –Hendrick, R.; Lewin, J.; D'Orsi, C. & et al. (2001), Non-Inferiority Study of FFDM in an Enriched Diagnostic Cohort: Comparison with Screen-Film Mammography in 625 Women., in M. J. Yaffe, ed., 'IWDM 2000: 5th International Workshop on Digital Mammography: Medical Physics', pp. 475-481. MRMC Analysis –Dorfman, D. D.; Berbaum, K. S. & Metz, C. E. (1992), 'Receiver Operating Characteristic Rating Analysis: Generalization to the Population of Readers and Patients with the Jackknife Method', Invest Radiol 27(9), 723-731. –Obuchowski, N. A. & Rockette, H. E. (1995), 'Hypothesis Testing of Diagnostic Accuracy for Multiple Readers and Multiple Tests: An ANOVA Approach with Dependent Observations', Commun Stat B-Simul 24(2), 285-308. –Toledano, A. Y. & Gatsonis, C. (1996), 'Ordinal Regression Methodology for ROC Curves Derived from Correlated Data', Stat Med 15(16), 1807. –Beiden, S. V.; Wagner, R. F. & Campbell, G. (2000), 'Components-of-Variance Models and Multiple-Bootstrap Experiments: An Alternative Method for Random-Effects, Receiver Operating Characteristic Analysis', Acad Radiol 7(5), 341-349. –Gallas, B. D. (2006), 'One-Shot Estimate of MRMC Variance: AUC', Acad Radiol 13(3), 353-362. –Gallas, B. D.; Bandos, A.; Samuelson, F. & Wagner, R. F. (2009), 'A Framework for Random-Effects ROC Analysis: Biases with the Bootstrap and Other Variance Estimators', Commun Stat A-Theory 38(15), 2586-2603. –Johnson, T. D. & Johnson, V. E. (2006), 'A Bayesian Hierarchical Approach to Multirater Correlated ROC Analysis', Stat Med 25(11), 1858-1871. –Song, X. & Zhou, X. (2005), 'A Marginal Model Approach for Analysis of Multi-Reader Multi-Test Receiver Operating Characteristic (ROC) Data', Biostatistics 6(2), 303-312. MRMC Analysis, General Study Design (Missing Data) –Gallas, B. D.; Pennello, G. A. & Myers, K. J. (2007), 'Multi-Reader Multi-Case Variance Analysis for Binary Data', J Opt Soc Am A 24(12), B70-B80. –Gallas, B. D. & Brown, D. G. (2008), 'Reader Studies for Validation of CAD Systems', Neural Networks 21(2-3), 387-397.

77 77 References Concordance –Harrell, F. E.; Califf, R. M.; Pryor, D. B.; Lee, K. L. & Rosati, R. A. (1982), 'Evaluating the Yield of Medical Tests', JAMA 247(18), 2543- 2546. –Pencina, M. J. & D'Agostino, R. B. (2004), 'Overall C as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation.', Stat Med 23(13), 2109--2123. –Obuchowski, N. A. (2006), 'An ROC-Type Measure of Diagnostic Accuracy When the Gold Standard is Continuous-Scale', Stat Med 25(3), 481-493. –Kendall, M. G. (1962), Rank Correlation Methods, Griffin & Co., London. –Smith, W. D.; Dutton, R. C. & Smith, N. T. (1996), 'A Measure of Association for Assessing Prediction Accuracy That is a Generalization of Non-Parametric ROC Area', Stat Med 15(1), 1199-1215. Clustered Data Analysis –Rao, J. N. K. & Scott, A. J. (1992), 'A Simple Method for the Analysis of Clustered Binary Data', Biometrics 48(2), 577-585. –Smith, P. J. & Hadgu, A. (1992), 'Sensitivity and specificity for correlated observations.', Stat Med 11(11), 1503--1509. –Obuchowski, N. A. (1997), 'Nonparametric Analysis of Clustered ROC Curve Data', Biometrics 53(2), 567-578. Location-Specific ROC –Swensson, R. G. (1996), 'Unified Measurement of Observer Performance in Detecting and Localizing Target Objects on Images', Med Phys 23(10), 1709-1725. –Edwards, D. C.; Kupinski, M. A.; Metz, C. E. & Nishikawa, R. M. (2002), 'Maximum Likelihood Fitting of FROC Curves under an Initial- Detection-and-Candidate-Analysis Model', Med Phys 29(12), 2861-2870. –Chakraborty, D. P. & Berbaum, K. S. (2004), 'Observer Studies Involving Detection and Localization: Modeling, Analysis and Validation', Med Phys 31(8), 2313-2330. –Popescu, L. (2007), 'Nonparametric ROC and LROC Analysis', Med Phys 34(5), 1556-1564. –Gur, D. & Rockette, H. E. (2008), 'Performance Assessments of Diagnostic Systems under the FROC Paradigm: Experimental, Analytical, and Results Interpretation Issues', Acad Radiol 15(10), 1312-1315.


Download ppt "1 Radiologist Evaluation of Reader Performance and Variability Brandon Gallas FDA/CDRH Office of Science and Engineering Labs Division of Imaging and Applied."

Similar presentations


Ads by Google