Integration of Radiologists Feedback into Computer-Aided Diagnosis Systems Sarah A. Jabon a Daniela S. Raicu b Jacob D. Furst b a Rose-Hulman Institute of Technology, Terre Haute, IN b School of Computing, CDM, DePaul Universtiy, Chicago, IL 60604
Overview Introduction Related Work The Data Methodology Simple Distance Metrics Linear Regression Principle Component Analysis Results Simple Distance Metrics Linear Regression Principle Component Analysis Conclusions Future Work
Introduction The 2008 official estimate 215,020 cases diagnosed 161,840 deaths will occur Five-year relative-survival rate (1996 – 2004): 15.2% Computer-aided diagnosis systems can help improve early detection
Related Work El-Naqa et al. mammography images neural networks and support vector machines Muramatsu et al. mammography images. three-layered artificial neural network to predict the semantic similarity rating between two nodules Park et al. linear distance-weighted K-nearest neighbor algorithm to identify similar images
Related Work ASSERT by Purdue University Content-based features: co-occurrence, shape, Fourier Transforms, global gray level statistics Radiologists also provide features BiasMap by Zhou and Huang Relevance feedback, content-based features Analysis: biased-discriminant analysis (BDA)
The Data Lung Image Database Consortium Reduced 1,989 images down to 149 (one for each nodule) Summarized the radiologists ratings (up to 4) into a single vector Each nodule has 7 semantic based characteristics and 64 content-based characteristics
Overview Introduction Related Work The Data Methodology Simple Distance Metrics Linear Regression Principle Component Analysis Results Simple Distance Metrics Linear Regression Principle Component Analysis Conclusions Future Work
Methodology
Methodology: Simple Distance Metrics Semantic-Based Similarity Content-Based Similarity
Simple Distance Metrics Content-Based Similarity Values (Euclidean) Semantic-Based Similarity Values (1 – Cosine)
Methodology: Linear Regression
Methodology: Principle Component Analysis LobulationMalignancyMarginSphericitySpiculationSubtletyTexture Lobulation Malignancy Margin Sphericity Spiculation Subtlety Texture Content-Based Features: 77 pairs with a correlation > pairs with a correlation > 0.8 or < -0.8
Scree Plots: 5 – 9 Matches
Methodology: Principle Component Analysis PCA on content-based features accounts for 99% of the variance 23 components PCA on semantic-based characteristics Method 1 accounts for 92% of the variance 4 components Method 2 accounts for 98% of the variance 6 components
Overview Introduction Related Work The Data Methodology Simple Distance Metrics Linear Regression Principle Component Analysis Results Simple Distance Metrics Linear Regression Principle Component Analysis Conclusions Future Work
Results: Simple Distance Metric MatchesGaborMarkov Co- Occurrence Gabor, Markov, and Co-Occurrence All Features 6 – – –
Matches: Nodule 117
Simple Distance Metrics
5 – 9 Matches: PCA and Linear Regression Linear Regression Principle Component Analysis Training and Testing Sets 5 – 9 Matches 326 Nodule Pairs 218 Nodule Pairs 218 Nodule Pairs PCAd Linear RegressionPCA 108 Nodule Pairs 108 Nodule Pairs PCAd Predicted Similarity Value
Results: Linear Regression Data Set No. of Nodule Pairs ( 2/3 Set) Correlation: Euclidean vs. Semantic R2R2 Adj. R 2 Feature Set Distance 6 – 9 Matches – 9 Matches dist 3 5 – 9 Matches – 9 Matches dist 3
Results: Linear Regression Data Set No. of Nodule Pairs (1/3 Set) Correlation: Euclidean vs. Semantic RMSD Euclidean Correlation: Predicted vs. Semantic RMSD Predicted Features 6 – 9 Matches – 9 Matches – 9 Matches – 9 Matches
Results: Linear Regression
Results: PCA Data Set No. of Nodule Pairs ( 1/3 Set) Correlation: Euclidean vs. Semantic RMSD Euclidean Correlation: Predicted vs. Semantic RMSD Predicted Features 6 – 9 Matches – 9 Matches – 9 Matches – 9 Matches
Results: PCA
RMSD – Percent of Range Linear Regression: No PCALinear Regression: PCA Data SetFeaturesEuclideanPredictedEuclideanPredicted 6 – 9 Matches %17.3%30.4%6.7% 6 – 9 Matches %12.9%30.4%12.5% 5 – 9 Matches %9.7%26.6%10.1% 5 – 9 Matches %11.1%26.6%11.8%
Example: Nodule 37 and Nodule 38 Nodule 38Nodule 37 Euclidean Similarity Value PCA Similarity Value Nodule Number LobulationMalignancyMarginSphericitySpiculationSubtletyTexture
Future Work Perform the analysis only nodules on which all three radiologists agree In order to address the small size of the data set, perform the analysis using a leave one out technique (instead of 2/3 training and 1/3 testing) Incorporate relevance feedback into the system
Questions?