Removing Background with Semantic Segmentation
Based on Ensemble Learning
Junhong Xu, Hanqing Guo, Aaron Kageza, Saeed AlQarni, Shaoen Wu
{jxu7, hguo, agkageza, saalqarni, swu}@bsu.edu
Abstract. This paper presents a deep learning approach to Kaggle Carvana Image
Masking Competition, which aims at extracting the car objects in high quality images
with the background removed. We formulate the background extraction problem as an
image segmentation problem. In this challenge, we have evaluated different U-Net
architectures. We have explored two different techniques in combining encoder
downsampling features with decoder upsampling features. In addition, we have
experimented replacing different pre-trained networks to accelerate the training process.
Finally, we have trained different models at different image scales and predicted the final
result with the ensemble method. Our final method has placed us at top 4% in the
challenge and achieved a dice coefficient score of 0.99694.
Keywords: Background removal, deep learning, semantic segmentation
1 Introduction
Background subtraction is an important technique in surveillance, video analysis, and
many other applications. This problem can be formed as a semantic segmentation problem
where each pixel is classified into either object or background class. Figure 1 shows a picture
of image segmentation in the car masking challenge.
Most approaches to image segmentation in the past are based on conditional random
fields(CRF) that leverages the relationship between pixels [6]. Recent years, deep learning has
dominated computer vision, speech recognition and many other areas. One popular network
architecture is convolutional neural networks(CNNs) that can learn discriminate features from
image data automatically [9]. Many network architectures have been proposed to improve the
learning ability such as VGG [15], Inception network [17], and ResNet [4]. There are many
existing work that address image segmentation using variants of these network architectures
which have drastically improved segmentation results compared to traditional methods. This
paper focuses on utilizing and modifying these existing network architectures to tackle Kaggle
Carvana Image Masking Competition which requires to extract cars from background in high
resolution images. This competition provides several challenges
MOBIMEDIA 2018, June 21-22, Qingdao, People's Republic of China
Copyright © 2018 ACM
DOI 10.4108/eai.21-6-2018.2276586