(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 8, No. 8, 2017 Lung Cancer Detection and Classiﬁcation with 3D Convolutional Neural Network (3D-CNN) Wafaa Alakwaa Faculty of Computers & Info. Cairo University, Egypt Mohammad Nassef Faculty of Computers & Info. Cairo University, Egypt Amr Badr Faculty of Computers & Info. Cairo University, Egypt Abstract—This paper demonstrates a computer-aided diag- nosis (CAD) system for lung cancer classiﬁcation of CT scans with unmarked nodules, a dataset from the Kaggle Data Science Bowl, 2017. Thresholding was used as an initial segmentation approach to segment out lung tissue from the rest of the CT scan. Thresholding produced the next best lung segmentation. The initial approach was to directly feed the segmented CT scans into 3D CNNs for classiﬁcation, but this proved to be inadequate. Instead, a modiﬁed U-Net trained on LUNA16 data (CT scans with labeled nodules) was used to ﬁrst detect nodule candidates in the Kaggle CT scans. The U-Net nodule detection produced many false positives, so regions of CTs with segmented lungs where the most likely nodule candidates were located as determined by the U-Net output were fed into 3D Convolutional Neural Networks (CNNs) to ultimately classify the CT scan as positive or negative for lung cancer. The 3D CNNs produced a test set Accuracy of 86.6%. The performance of our CAD system outperforms the current CAD systems in literature which have several training and testing phases that each requires a lot of labeled data, while our CAD system has only three major phases (segmentation, nodule candidate detection, and malignancy classiﬁcation), allowing more efﬁcient training and detection and more generalizability to other cancers. Keywords—Lung cancer; computed tomography; deep learning; convolutional neural networks; segmentation I. I NTRODUCTION Lung cancer is one of the most common cancers, ac- counting for over 225,000 cases, 150,000 deaths, and $12 billion in health care costs yearly in the U.S. [1]. It is also one of the deadliest cancers; overall, only 17% of people in the U.S. diagnosed with lung cancer survive ﬁve years after the diagnosis, and the survival rate is lower in developing countries. The stage of a cancer refers to how extensively it has metastasized. Stages 1 and 2 refer to cancers localized to the lungs and latter stages refer to cancers that have spread to other organs. Current diagnostic methods include biopsies and imaging, such as CT scans. Early detection of lung cancer (detection during the earlier stages) signiﬁcantly improves the chances for survival, but it is also more difﬁcult to detect early stages of lung cancer as there are fewer symptoms [1]. Our task is a binary classiﬁcation problem to detect the presence of lung cancer in patient CT scans of lungs with and without early stage lung cancer. We aim to use methods from computer vision and deep learning, particularly 2D and 3D convolutional neural networks, to build an accurate classiﬁer. An accurate lung cancer classiﬁer could speed up and reduce costs of lung cancer screening, allowing for more widespread early detection and improved survival. The goal is to construct a computer-aided diagnosis (CAD) system that takes as input patient chest CT scans and outputs whether or not the patient has lung cancer [2]. Though this task seems straightforward, it is actually a needle in the haystack problem. In order to determine whether or not a patient has early-stage cancer, the CAD system would have to detect the presence of a tiny nodule (< 10 mm in diameter for early stage cancers) from a large 3D lung CT scan (typically around 200 mm × 400 mm × 400 mm). An example of an early stage lung cancer nodule shown in within a 2D slice of a CT scan is given in Fig. 1. Furthermore, a CT scan is ﬁlled with noise from surrounding tissues, bone, air, so for the CAD systems search to be efﬁcient, this noise would ﬁrst have to be preprocessed. Hence our classiﬁcation pipeline is image preprocessing, nodule candidates detection, malignancy classiﬁcation. In this paper, we apply an extensive preprocessing tech- niques to get the accurate nodules in order to enhance the accuracy of detection of lung cancer. Moreover, we perform an end-to-end training of CNN from scratch in order to realize the full potential of the neural network i.e. to learn discriminative features. Extensive experimental evaluations are performed on a dataset comprising lung nodules from more than 1390 low dose CT scans. Figure 1: 2D CT scan slice containing a small (5mm) early stage lung cancer nodule. The paper’s arrangement is as follows: Related work is summarized brieﬂy in Section II. Dataset for this paper is described in Section III. The methods for segmentation are presented in section IV. The nodule segmentation is introduced in Section V based on U-Net architecture. Section VI presents 3D Convolutional Neural Network for nodule classiﬁcation and www.ijacsa.thesai.org 409 | Page