Nordic Machine Intelligence, MedAI 2021 https://doi.org/10.5617/nmi.9131 Employing GRU to combine feature maps in DeeplabV3 for a better segmentation model Mahmood Haithami 1 , Amr Ahmed 1 , Iman Yi Liao 1 , Hamid Jalab 2 1. University of Nottingham Malaysia, 43500 Semenyih, Malaysia 2. Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia Abstract In this paper we aim to enhance the segmentation capabilities of DeeplabV3 by employing Gated Recurrent Neural Network (GRU). A 1-by-1 convolution in DeeplabV3 were replaced by GRU after the Atrous Spatial Pyramid Pooling (ASSP) layer to combine the input feature maps. The convolution and GRU have sharable parameters, though, the latter has gates that enable/disable the contribution of each input feature map. The experiments on unseen test sets demonstrate that employing GRU instead of convolution would produce better segmentation results. The used datasets are public datasets provided by MedAI competition. Keywords: Segmentation; deep learning; GRU; Polyp; Instrument Introduction Colon polyp segmentation is considered a challenging task as a polyp could have various forms and does not have clear borders in some cases [1]. Furthermore, the lack of large and representative datasets in the endoscopy domain is a persistent challenge that still exists today. Diﬀerent techniques have been proposed in the literature to address such challenges by applying image augmentations, transfer learning, and ensemble learning [2]. However, very few works have studied the application of Recurrent Neural Network (RNN) models [2, 3]. In this paper, we attempt to employ Gated Recurrent Neural Network (GRU) (i.e., a variant of RNN model) in DeeplabV3 [4] to enhance its segmentation capabilities. Method Figure 1 shows the diﬀerence between DeeplabV3 [4] and the proposed model. We employ the GRU instead of the 1-by-1 convolution to do the projection of the ﬁve feature maps produced by the Atrous Spatial Pyramid Pooling layer (ASPP) [4], as illustrated in Figure 2. Mathematically, this mapping can be expressed as follows: G : R 5C×H×W → R C×H×W (1) Where C, H, and W refers to number of channels, height, and width of the feature map. The motivation of using GRU is to utilize the weights sharing technique (as in the convolution method) meanwhile employing gates to regulate the ﬂow of information across diﬀerent feature maps. Hence, it provides a better method than the 1-by-1 convolution used in the original DeeplabV3 for combining/projecting the input feature maps into one feature map. The experiments were conducted as follows. 1) All images were resized to 265-by-300 to reduce the computational complexity. 2) Each of the Instrument as well as Polyp datasets were shuﬄed and divided into training, validation, and testing subsets with a percentage of 80%, 10%, and 10%, respectively. 3) The evaluation metrics are mean Intersection-over-Union (mIOU), Dice, and Accuracy. The code for the proposed model can be found at:https://github.com/mss3331/ Proposed-model-for-MedAI21 Results To demonstrate the performance of the proposed model, we employed two diﬀerent datasets, i.e., Kvasir-Seg [5] and Kvasir-Instrument [6], for polyp segmentation and instrument segmentation, respectively [7]. Both datasets were provided in the MedAI competition as training datasets [7]. The performance of the proposed model versus the other state-of-the-art SegNet and DeeplabV3 are summarized in Table 1 and Table 2. It is noticed that the proposed model performed better than SegNet and DeeplabV3 in all experiments except for the Instrument validation set. For the test sets, © 2021 Author(s). This is an open access article licensed under the Creative Commons Attribution License 4.0. (http://creativecommons.org/licenses/by/4.0/).