Semantic Segmentation of Kidney Tumor using Convolutional Neural Networks Laura Daza, Catalina G´ omez, and Pablo Arbel´ aez Universidad de los Andes, Bogot´ a, Colombia {la.daza10, c.gomez10, pa.arbelaez}@uniandes.edu.co Abstract. We present a fully automatic method for segmentation of kidney tumors in CT volumetric data based on DeepLab v3+, the state- of-the-art model in semantic segmentation in natural images. We adapt the architecture to process medical data and reduce the computational complexity to allow training 3D models. We evaluate our approach on the Kidney Tumor Segmentation Challenge 2019 dataset, and deﬁne a validation set to experiment with the model’s parameters. In our valida- tion set, we report a dice score of XX for the kidney class and YY for the tumor class. Keywords: Kidney tumor · Semantic Segmentation · CT scans. 1 Introduction Kidney cancer is the 12th most common cancer worldwide, with over 400,000 new cases diagnosed in 2018 [1]. Although it is one of the most common cause of death from cancer, the survival rates are relatively high in developed countries, but low in lower income countries where cancer is often detected at later stages. The standard tests to diagnose kidney cancer are ultrasound scans, cystoscopy or CT scans of the urinary system (CT urogram) [2]. The treatment options depend on certain factors including the size of the tumor and where it is located in the kidney and if has spread to another part of the body. The automatic segmentation and localization of tumors in CT scans can contribute to the di- agnosis, and thus, to select the most appropriate treatment during early stages of the tumor. Deep learning models, such as Fully Convolutional Networks [3], have been widely used for segmentation tasks in the medical image domain. For instance, U-Net [4] is an extension of the FCN, designed for cell segmentation in light microscopy images. An important modiﬁcation of U-Net is that the expansive (upsampling) path has a large number of feature channels (similar to the con- tracting path), and they merge high resolution features from the contracting path with the upsampled outputs to reﬁne the ﬁnal prediction. The drawbacks of traditional segmentation methods include the reduction of the original resolution at the encoding phase and objects at diﬀerent scales. DeepLab [5] addresses these challenges by replacing pooling operations with