Removing Background with Semantic Segmentation Based on Ensemble Learning Junhong Xu, Hanqing Guo, Aaron Kageza, Saeed AlQarni, Shaoen Wu {jxu7, hguo, agkageza, saalqarni, swu}@bsu.edu Abstract. This paper presents a deep learning approach to Kaggle Carvana Image Masking Competition, which aims at extracting the car objects in high quality images with the background removed. We formulate the background extraction problem as an image segmentation problem. In this challenge, we have evaluated different U-Net architectures. We have explored two different techniques in combining encoder downsampling features with decoder upsampling features. In addition, we have experimented replacing different pre-trained networks to accelerate the training process. Finally, we have trained different models at different image scales and predicted the final result with the ensemble method. Our final method has placed us at top 4% in the challenge and achieved a dice coefficient score of 0.99694. Keywords: Background removal, deep learning, semantic segmentation 1 Introduction Background subtraction is an important technique in surveillance, video analysis, and many other applications. This problem can be formed as a semantic segmentation problem where each pixel is classified into either object or background class. Figure 1 shows a picture of image segmentation in the car masking challenge. Most approaches to image segmentation in the past are based on conditional random fields(CRF) that leverages the relationship between pixels [6]. Recent years, deep learning has dominated computer vision, speech recognition and many other areas. One popular network architecture is convolutional neural networks(CNNs) that can learn discriminate features from image data automatically [9]. Many network architectures have been proposed to improve the learning ability such as VGG [15], Inception network [17], and ResNet [4]. There are many existing work that address image segmentation using variants of these network architectures which have drastically improved segmentation results compared to traditional methods. This paper focuses on utilizing and modifying these existing network architectures to tackle Kaggle Carvana Image Masking Competition which requires to extract cars from background in high resolution images. This competition provides several challenges MOBIMEDIA 2018, June 21-22, Qingdao, People's Republic of China Copyright © 2018 ACM DOI 10.4108/eai.21-6-2018.2276586