Toward Automated Pre-Biopsy Thyroid Cancer Risk Estimation in Ultrasound Predictive Analytics for Complex Conditions S36 Alfiia Galimzianova, Sean M. Siebert, Aya Kamaya, Terry S. Desser and Daniel L. Rubin Stanford University
Disclosure Grant funding from GE Medical Systems under Stanford- GE Blue Sky Initiative No other relationships with commercial interests AMIA 2017 | amia.org
Motivation (1): Thyroid Nodules Thyroid nodules are very common Palpable in 4-7% of adults Visible in up to 67% on ultrasound exams Commonly detected on routine ultrasound (US) Diagnostic concern: cancer AMIA 2017 | amia.org
Motivation (2): Cancerous Nodules Thyroid cancer: Rare; 11th most common cancer in the United States 5-10% of thyroid nodules Common incidental finding: occult papillary CA in 36% of autopsies 5-year survival of early cancers near 100% for most common types Aggressive cancers extremely rare However, recognizing these on US is challenging; no reliable diagnostic features Howlader N et al. (2016) SEER Cancer Statistics Review, 1975-2014, National Cancer Institute. Bethesda, MD, https://seer.cancer.gov/csr/1975_2014/ AMIA 2017 | amia.org
Motivation (3): Thyroid Cancer Diagnosis Incidence of thyroid cancer has tripled Attributable to increasing use of US But no change in mortality! This is overdiagnosis! Morris et al. (2016) Changing Trends in the Incidence of Thyroid Cancer in the United States. JAMA Otolaryngol Head Neck Surg.;142(7):709–711. doi:10.1001/jamaoto.2016.0230 AMIA 2017 | amia.org
Motivation (4): Thyroid CA Overdiagnosis Overdiagnosis: Dx of disease that would have remained occult if not for detection Leads to aggressive treatment for indolent disease Morbidity of thyroid surgery Medical costs of thyroid CA exceed $1.6 billion/yr in U.S. and expected to more than double by 2030
Thyroid nodules pose a diagnostic challenge 67% of population has US-detectable thyroid nodules Majority of nodules are benign Definitive diagnosis involves biopsy or surgery (invasive, costly, morbidity) As US proliferates, more nodules are being detected with many unnecessary biopsies Need pre-biopsy malignancy risk evaluation tools AMIA 2017 | amia.org
Background: Current Procedure Diagnostic imaging Cancer risk estimation Suspicion Follow-up palpation incidental imaging finding ultrasound exam nodule localization appearance description measurements benign vs malignant pattern total score favoring benign or malignant Pathological diagnosis biopsy surgery Treatment AMIA 2017 | amia.org
Background: Cancer Risk Estimation Classification systems based on qualitative image features: Pattern-based classifiers Consensus-based complex patterns of benign and malignant nodules Limitation: Covers only a subset of the variety of lesion appearances Scoring-based classifiers Evidence-based features and their weights resulting in total malignancy score Limitation: Oversimplified rules for scoring and management decision, limited number of scores Limitation common to both: Inter-reader variation AMIA 2017 | amia.org
Background: Cancer Risk Estimation Classification systems based on qualitative image features : Pattern-based classifiers US images Features Pattern matching ATA 2015 Punctate echogenic foci Predominantly solid Hypoechoic Lobulated Solid hypoechoic nodule or solid hypoechoic component of a partially cystic nodule with one or more of the following features: irregular margins (infiltrative, microlobulated), microcalcifications, taller than wide shape, rim calcifications with small extrusive soft tissue component, evidence of extrathyroidal extention. High suspicion Intermediate Low suspicion Very low Benign AMIA 2017 | amia.org
Background: Cancer Risk Estimation Classification systems based on qualitative image features: Pattern-based classifiers Scoring-based classifiers US images Features Pattern matching ATA 2015 High suspicion: Solid hypoechoic component of a partially cystic nodule with irregular margins (microlobulated) and microcalcifications Punctate echogenic foci Predominantly solid Hypoechoic Lobulated US images Features Classification ACR 2017 Malignancy score Punctate echogenic foci Predominantly solid Hypoechoic Lobulated ∑ 3 1 2 1 2 3 4 5 points AMIA 2017 | amia.org
Hypothesis Computer-derived image features on thyroid US can distinguish benign and malignant nodules Advantages: Automated assessment Reproducible Practical to integrate into workflow AMIA 2017 | amia.org
Goal: Computerized Decision Support Strategy: Machine learning classification US images w/diagnoses Quantitative Image Features Classification model Training US images Quantitative Image Features Classification Malignancy score ElasticNet 95% Intensity x N Texture x M Shape x K Edge x L Test AMIA 2017 | amia.org
Two Classification Approaches Types of image features: Radiologist-observed qualitative features Tailored learning of the complex feature combinations given representative training data Quantitative features extracted from images Learning both features and the rules to discriminate benign and malignant nodules AMIA 2017 | amia.org
Materials: Qualitative Image Features ACR TI-RADS* nomenclature descriptors of nodule features: Composition {Solid, Predominantly solid, Predominantly cystic, Cystic, Spongiform} Echogenicity {Hyperechoic, Isoechoic, Hypoechoic, Very hypoechoic} Taller-than-wide shape Margins Border: {smooth, irregular, lobulated, ill-defined} Halo Extrathyroidal extension Echogenic foci {Punctate echogenic foci, Macrocalcifications, Peripheral calcifications, Comet-tail artifacts} *Grant et al. (2015). Thyroid Ultrasound Reporting Lexicon: White Paper of the ACR Thyroid Imaging, Reporting and Data System (TIRADS) Committee. Journal of the American College of Radiology,12(12), 1272-1279. AMIA 2017 | amia.org
Materials: Datasets Nodules from 93 patients: 47 pathology-confirmed malignant 46 pathology-confirmed benign Imaged in transverse and longitudinal planes on US ACR TI-RADS qualitative descriptors Nodule ROI delineations by expert radiologist AMIA 2017 | amia.org
Materials: Expert Annotation ePad© annotation: Done by expert radiologist, blindly to diagnosis Nodule delineation and semantic annotation http://epad.stanford.edu Longitudinal Transverse AMIA 2017 | amia.org
Methods: Expert Risk Scoring Classification systems for cancer risk: ACR TI-RADS, Tessler et al., JACR 2017 19 features (15 nonzero weighted) Zayadeen et al., AJR 2016 14 features Kwak et al., KJR 2013 7 features Russ et al., Eur J Endocrinol. 2013 11 features Kwak et al., Radiology 2011 8 featured Park et al., Thyroid 2009 13 features AMIA 2017 | amia.org
Methods: Quantitative Features Intensity {statistics (x18), histogram (x33), deciles (x9), peak, contrast, edge contrast}–62 features Margin curvature {5-kernel Local Area Integral Invariant statistics (x6)}–30 features Edge sharpness Sigmoid fit window features: {statistics (x8), histogram (x33), deciles (x9)}—50 features, Sigmoid fit scale features: {statistics (x8), histogram (x33), deciles (x9)}—50 features Texture {GLCM (x16), GLRLM (x11)}–27 features Total of 219 features AMIA 2017 | amia.org
Experimental design Implemented classifiers: Details: Proposed method: Elastic net classifier with quantitative features (”ML-Quantitative”) Baseline method: Qualitative features with elastic net classifier (”ML- Qualitative”) Comparision methods: Six qualitative feature-based scoring systems (TI-RADS variants) Details: Ground truth: Biopsy-proven malignancy/benignity Evaluation: Leave-one-out validation (x93 independently trained models) AUC-ROC, potentially spared benign biopsy rate analysis AMIA 2017 | amia.org
Results: Qualitative Image Features Distribution of qualitative features across cases Typically benign, not biopsied AMIA 2017 | amia.org
Results: Qualitative Image Features Use of qualitative features by classification systems AMIA 2017 | amia.org
Results: Scoring Performance Classifier AUROC CI 95% ML-Quantitative 0.83 (0.72,0.94) ML-Qualitative 0.78 (0.67,0.89) Kwak et al. 2013 0.81 (0.71,0.91) Zayadeen et al. 2016 0.77 (0.66,0.88) Russ et al. 2013 0.75 (0.64,0.86) ACR TI-RADS 2017 0.73 (0.63,0.84) Park et al. 2009 (0.61,0.84) Kwak et al. 2011 0.72 (0.62,0.82) True positive rate Machine learning classifiers ML-Computational ML-Semantic Expert classifiers False positive rate AMIA 2017 | amia.org
Results: Scoring Performance All malignancies biopsied, 57% benign biopsies our motivation Biopsied malignancy rate True positive rate Machine learning classifiers ML-Computational ML-Semantic Expert classifiers Unnecessary biopsy rate False positive rate AMIA 2017 | amia.org
Results: Biopsy Decision Sparing benign findings (TN) at the cost of missing malignancies (FN) Spared benign findings by decision rules of classification systems TNs at given level of FN Classifiers FN=0 FN=1 FN=2 ML-Quantitative 20 26 28 ML-Qualitative 1 4 ACR TI-RADS 2017 8 Zayadeen et al. 2016 2 18 Kwak et al. 2013 14 Russ et al. 2013 7 10 Kwak et al. 2011 11 Park et al. 2009 Spared benignities Missed malignancies Risk Classification system TN at FNA decision FN at FNA decision ACR TI-RADS 2017 15 2 Zayadeen et al. 2016 18 7 Kwak et al. 2013 26 4 Russ et al. 2013 1 Kwak et al. 2011 3 Park et al. 2009 35 20 highest number of spared benign nodules at minimal missed malignancy rate TN – true negatives (benign nodules labeled as benign), FN – false negatives (malignant nodules labeled as malignant) AMIA 2017 | amia.org
Results: Feature Importance Qualitative features in classification systems Qualitative features in machine learning classifier Quantitative features in machine learning classifier 20 features: 13 malignant 7 benign 17 features: 11 malignant 6 benign 102 features 65 malignant 37 benign AMIA 2017 | amia.org
Summary Machine learning-based malignancy classifiers have a potential to be used as thyroid cancer risk estimation tools: Comparable performance to best current classification system- based methods (AUC 83% vs 81%) Highest number of benign nodules spared from biopsy at minimal missed malignancy rate (20 at 0 vs 18 at 1) Provides rich set of features to describe both benign and malignant appearance No inter/intra-rater variability at low cost AMIA 2017 | amia.org
Acknowledgements This research was supported in part by grants from: GE Medical Systems (Blue Sky Initiative at Stanford University) National Cancer Institute, National Institutes of Health, U01CA142555, 1U01CA190214, and 1U01CA187947 AMIA 2017 | amia.org
Thank you! Email: dlrubin@stanford.edu