1 JBHI-01972-2021 1 Comparison of Kidney Segmentation Under Attention U-Net Architectures Marcia Hon, Vasileios Alevizos, Ryerson University, University of Aegean Abstract—One of the most prominent machine learning advantages in the medical industry is the early detection of disease. Automatic kidney detection is of great importance for rapid diagnosis and treatment, where related diseases occupy over 73,750 new cases in the US in 2020 [1]. Today, the performance of diagnosis has been by highly trained radiologists. However, the complex structures contribute to speckle noise and inhomogeneous intensity profiles. Thus, there is a necessity to automate segmentation on kidney ultrasounds using U-Net Deep Learning architectures - an innovative solution for Medical Imaging Analysis. In this research, our focus is on the comparison of Attention U-Net in the context of different backbones such as VGG19, ResNet152V2, and EfficientNetB7. By providing this comparison, we will accomplish a survey for future researchers to more effectively decide on which Attention U-Net architecture to utilize for their segmentation projects. Index Terms—kidney, segmentation, U-Net, CNN. I. INTRODUCTION IN recent years, intensive research on medical imaging and pattern recognition with performance equal to human-handed inspection or even better has seen exponential growth ‒ albeit not free of criticism and controversy. However, medical applications are under pressure of high accuracy in the detection of convoluted geometrical shapes. Traditionally, architectures were either non-standard or very complex to use to highlight these shapes. Accordingly, in 2015, U-Net was introduced to accomplish the function of automated image segmentation with regard to medical imaging. It is a system with a specific Deep Learning architecture that resembles a “U” - encoding followed by decoding with skip connections. The goal of this research is to use kidney detection as a proof of concept to provide an analogy of the performance of different U-Net models. The hope is to facilitate future 1 Manuscript received August 30, 2021. Marcia Hon is PhD candidate in Ryerson University, Toronto, Canada (e-mail: Marcia.hon.29@ryerson.ca). Vasileios Alevizos graduated from University of Aegean, Karlovasi, Samos, 83200 Greece (e-mail: vasileios.alevizos@pm.me). researchers when deciding on the best U-Net Segmentation algorithm to use. To the best of our knowledge, there does not appear to be any paper comparing Attention U-Net with regards to backbones - VGG19, ResNet152V2, and EfficientNetB7 and exclusively within the context of kidney segmentation. We provide recommendations on what architectures are the best to use. II. MOTIVATION This academic contribution aims to demonstrate a segmentation system based on U-Net, to address the elusive challenges that hinder a complex deep network process for medical diagnosis. Comparison of different backbone algorithms was also scarce based on classifications with recent encoders and backbones. Another question that sparked curiosity was tuning a U-Net architecture with the latest backbone algorithms, hitherto without any previous related comparison. The main characteristics of backbones aim to solve efficiency problems by reducing unnecessary computations. A. Previous work Related research has been conducted on kidney datasets using U-Net architectures, namely 3D U-Net [2]. Nevertheless, none of them explore the potential advantages of backbones such as VGG19, ResNet152V2, and EfficientNetB7. These backbones are CNN (Convolutional Neural Network) architectures that have won the ImageNet competitions whereby millions of images have been categorized into around 1000 categories like dogs and cats. Moreover, Seum et. al. [3] suggested incorporating segmentation as the first step for the COVID-19 diagnosis pipeline. The reason for this was the enhancement of tuning what is being sent to the CNN for classification. For instance, kidney segmentation could be utilized to determine kidney location prior to sending to a CNN. This information would indicate if there is a disease such as tumors or stones within the kidney annotated region. Thus, improving the performance of CNN. Z. Wang et. al. [4] on the other hand, proposed a brand new U-Net called “RAR-U-Net” which stands for “Residual encoder to Attention decoder by Residual connections framework for medical image segmentation under noisy labels”. Our investigation deals with the ordinary specimen of healthy kidney ultrasounds that are of relatively good quality, thus, a “noisy label” is not of concern to us. If we are to expand our project to accommodate more complex images, RAR-U-Net would be implemented. Li. et. al. [5], introduce “ANU-Net”. A creation that was attempted for a new “U-Net” that is more robust and able to more correctly annotate the medical images under attention mechanism. For our project, we are only considering U-Nets