Nordic Machine Intelligence, MedAI 2021
https://doi.org/10.5617/nmi.9131
Employing GRU to combine
feature maps in DeeplabV3 for a
better segmentation model
Mahmood Haithami
1
, Amr Ahmed
1
, Iman Yi Liao
1
, Hamid Jalab
2
1. University of Nottingham Malaysia, 43500 Semenyih, Malaysia
2. Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
Abstract
In this paper we aim to enhance the segmentation capabilities
of DeeplabV3 by employing Gated Recurrent Neural Network
(GRU). A 1-by-1 convolution in DeeplabV3 were replaced by
GRU after the Atrous Spatial Pyramid Pooling (ASSP) layer
to combine the input feature maps. The convolution and
GRU have sharable parameters, though, the latter has gates
that enable/disable the contribution of each input feature
map. The experiments on unseen test sets demonstrate that
employing GRU instead of convolution would produce better
segmentation results. The used datasets are public datasets
provided by MedAI competition.
Keywords: Segmentation; deep learning; GRU; Polyp;
Instrument
Introduction
Colon polyp segmentation is considered a challenging
task as a polyp could have various forms and does not
have clear borders in some cases [1]. Furthermore,
the lack of large and representative datasets in the
endoscopy domain is a persistent challenge that still
exists today. Different techniques have been proposed
in the literature to address such challenges by applying
image augmentations, transfer learning, and ensemble
learning [2]. However, very few works have studied the
application of Recurrent Neural Network (RNN) models
[2, 3]. In this paper, we attempt to employ Gated
Recurrent Neural Network (GRU) (i.e., a variant of RNN
model) in DeeplabV3 [4] to enhance its segmentation
capabilities.
Method
Figure 1 shows the difference between DeeplabV3 [4] and
the proposed model. We employ the GRU instead of
the 1-by-1 convolution to do the projection of the five
feature maps produced by the Atrous Spatial Pyramid
Pooling layer (ASPP) [4], as illustrated in Figure 2.
Mathematically, this mapping can be expressed as follows:
G : R
5C×H×W
→ R
C×H×W
(1)
Where C, H, and W refers to number of channels,
height, and width of the feature map. The motivation of
using GRU is to utilize the weights sharing technique (as
in the convolution method) meanwhile employing gates to
regulate the flow of information across different feature
maps. Hence, it provides a better method than the
1-by-1 convolution used in the original DeeplabV3 for
combining/projecting the input feature maps into one
feature map.
The experiments were conducted as follows. 1)
All images were resized to 265-by-300 to reduce the
computational complexity. 2) Each of the Instrument
as well as Polyp datasets were shuffled and divided
into training, validation, and testing subsets with a
percentage of 80%, 10%, and 10%, respectively. 3)
The evaluation metrics are mean Intersection-over-Union
(mIOU), Dice, and Accuracy. The code for the proposed
model can be found at:https://github.com/mss3331/
Proposed-model-for-MedAI21
Results
To demonstrate the performance of the proposed model,
we employed two different datasets, i.e., Kvasir-Seg [5]
and Kvasir-Instrument [6], for polyp segmentation and
instrument segmentation, respectively [7]. Both datasets
were provided in the MedAI competition as training
datasets [7]. The performance of the proposed model
versus the other state-of-the-art SegNet and DeeplabV3
are summarized in Table 1 and Table 2.
It is noticed that the proposed model performed better
than SegNet and DeeplabV3 in all experiments except
for the Instrument validation set. For the test sets,
© 2021 Author(s). This is an open access article licensed under the Creative Commons Attribution License 4.0.
(http://creativecommons.org/licenses/by/4.0/).