Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 698 | Current Topics in Data Science

Similar presentations


Presentation on theme: "CS 698 | Current Topics in Data Science"— Presentation transcript:

1 CS 698 | Current Topics in Data Science
Dr. Usman Roshan Deep Convolutional Neural Networks for Lung Cancer Detection Paper Presentation | Spring 2018 Fadi G. Farhat February 15th, 2018 New Jersey Institute of Technology

2 Authors Albert Chon • Peter Lu Niranjan Balachandar
Department of Computer Science Stanford University

3 Introduction: Lung cancer is one of the most common and deadliest cancers 225,000 cases; 150,000 deaths; $12 billion in healthcare costs yearly in the United States Only 17% of people in the U.S. diagnosed with lung cancer survive five years after the diagnosis Current diagnostic methods include biopsies and imaging, such as CT scans Early detection of lung cancer significantly improves the chances for survival; difficult to do with fewer symptoms

4 Objective: Binary classification problem to detect the presence of lung cancer in patient CT scans of lungs with and without early stage lung cancer Build an accurate classifier using 2D and 3D convolutional neural networks Classifier could speed up and reduce costs of lung cancer screening; allow early detection; improve survival Computer-aided diagnosis (CAD) system will take as input patient chest CT scans, and outputs whether or not the patient has (early stage) or is likely to develop lung cancer

5 Challenges: CAD system must detect the presence of a tiny nodule (less than 10 mm in diameter for early stage) from a large 3D lung CT scan (around 200 mm x 400 mm x 400 mm) Example of an early stage lung cancer nodule (~5mm) CT scan is filled with noise from surrounding tissues, bone, air, water, blood

6 Data: Primary dataset: patient lung CT scan dataset from Kaggle’s Data Science Bowl 2017 Labeled data for 2101 patients; divided into training set of 1261, validation set of 420, and test set of 420 Data consists of CT scan data (100 to 400 2D slice images per patient) and a label (0 for no cancer, 1 for cancer); Kaggle dataset does not have labeled nodules!

7 Data (cont.): Secondary dataset: patient lung CT scan data with labeled nodules from the LUng Nodule Analysis 2016 (LUNA16) Challenge LUNA16 dataset has labeled data for 888 patients; divided into training set of 710, validation set of 178 Data consists of CT scan data and a nodule label (list of nodule center coordinates and diameter)

8 Approach: Preprocess the 3D CT scans using segmentation, normalization, down-sampling, and zero-centering Train a U-Net (2D Convolutional Networks for Biomedical Image Segmentation) for nodule candidate detection Input regions around nodule candidates detected by the U-net into 3D CNNs to classify the CT scans as positive or negative for lung cancer

9 Preprocessing & Segmentation:
Convert the pixel values in each image to Hounsfield units (HU), a measurement of radiodensity, then stack 2D slices into a single 3D image bone: tissue:

10 Preprocessing & Segmentation:
Use segmentation to mask out the bone, outside air, and other substances that would make the data noisy; retain only lung tissue information Watershed and Thresholding segmentation tested; Thresholding used original thresholding watershed

11 Preprocessing & Segmentation:
Normalize the 3D image by applying linear scaling Down-sample each 3D image by a scale of 0.5 in each of the three dimensions Zero-center the data by subtracting the mean of all the images from the training set

12 U-Net for Nodule Detection:
Find small boxes containing top cancerous nodule candidates Train a modified version of the U-Net on the LUNA16 data Model is trained to output images (256x256) where each output pixel has a value between 0 and 1 indicating the probability the pixel belongs to a nodule Trained U-Net is then applied to the segmented Kaggle CT scan slices to generate nodule candidates

13 U-Net for Nodule Detection:

14 U-Net for Nodule Detection:

15 U-Net for Nodule Detection:
U-Net produces a strong signal for the actual nodule, but also produces a lot of false positives U-Net labeled input U-Net predicted output true nodule location

16 U-Net for Nodule Detection:
Solution: Locate top 8 (most active) nodule candidates (32x32x32 volumes) and save them Top sectors not permitted to overlap to prevent them from simply being clustered in the brightest region of the image Combine these sectors into a single 64x64x64 volume and use as input to classifiers, which assign a label (cancer or not cancer)

17 Malginancy Classifiers:
Linear classifier used as a baseline, then a vanilla 3D CNN, and a GoogleNet-based 3D CNN were applied Each classifier used weighted loss (weight for a label is the inverse of the frequency of the label in the training set) CNNs use ReLU activation and droupout after each convolutional layer during training ReLU = Rectified Linear Unit

18 Malginancy Classifiers (cont.):
Vanilla 3D CNN (left) and GoogleNet 3D CNN (right) architectures

19 Results: Kaggle test set accuracy, sensitivity, specificity, and AUC of ROC Sensitivity: true positive rate Specificity: true negative rate AUC: Area Under the ROC Curve ROC: Receiver Operating Characteristic FPR vs. TPR for diff. cutoff points

20 Results (cont.): Observation: activations showing that cancerous nodule presence (and location) is detected in some outputs

21 Conclusions: The deep 3D CNN models, and in particular the GoogleNet-based model, performed the best on the test set State-of-the-art performance AUC of 0.83 not achieved; models performed well considering that less labeled data was used (than most state-of-the-art CAD systems) Current model could be extended to determine the exact location of the cancerous nodules, and not only whether or not the patient has cancer (slide 20)

22 Future Work: Use Watershed method instead of Thresholding as the initial lung segmentation Make the networks deeper Perform more extensive hyper-parameter tuning Generalize: extend models to 3D images for other cancers


Download ppt "CS 698 | Current Topics in Data Science"

Similar presentations


Ads by Google