64
Scientifc Journal of Astana IT University
ISSN (P): 2707-9031 ISSN (E): 2707-904X
VOLUME 13, MARCH 2023
Copyright © 2023, Authors. This is an open access article under the Creative Commons CC BY license
Nam Diana
Master of Tech. Sci., PhD Student
di_nam@kbtu.kz, orcid.org/0000-0002-9356-3114
School of Information Technology and Engineering,
Kazakh British Technical University, Almaty, Kazakhstan
Pak Alexandr Alexandrovich
Candidate of tech. sciences, Professor
a.pak@kbtu.kz, orcid.org/0000-0002-8685-9355
School of Information Technology and Engineering,
Kazakh British Technical University, Almaty, Kazakhstan
DOI: 10.37943/13BKBF2003
OVERVIEW OF TRANSFORMER-BASED MODELS
FOR MEDICAL IMAGE SEGMENTATION
Abstract: Premedical diagnostics is the process of examining survey results. Correct
premedical diagnostics can improve the process of patient management and reduce the
burden on the medical sector. Diagnostics of medical images such as computed tomography
and X-ray are an obligatory step for further treatment. However, the shortage of clinicians
causes delays in this step. We observed two state-of-the-art algorithms proposed for medical
image segmentation: TransUnet and Swin-Unet. We conducted a theoretical comparison of
algorithms in terms of the applicability of pre-hospital diagnostics according to quality and
speed of training. The comparison is based on the original source of code provided by the
authors of the original articles. We chose these two algorithms because they have similar
U-form architecture, a high level of citation, and show competitive DICE scores on pictures
of various human organs. Some architectural features were also important. Both models
inherit key elements of U-net. TransUnet is a hybrid Transformer and CNN model. It consists
of Transformer encoder and a convolutional decoder. Some additional computations are
required in the bottleneck. Swin-Unet is a fully Transformer-based model. These architectural
differences give rise to a difference in the number of trainable parameters. Generally, deeper
architectures with bigger number of parameters usually show better performance, however,
according to our review, Swin-Unet has smaller number of parameters and shows better DICE
and Hausdorff Distance. It should be noted that the distribution between false positive and
false negative predictions is important in medical image processing. It is crucial to avoid
overloading the medical sector while also not missing any sick patients. Precision and recall
can be used to evaluate the ratio of incorrect predictions. Therefore, we also observed the
results of caries segmentation where precision and DICE were provided. In this specifc case,
TransUnet shows better DICE and recall values but worse precision.
Keywords: Computer Vision, Transformers, Image processing, premedical diagnostics,
Segmentation
Introduction
In the modern world, there is a big problem with a lack of clinicians. It causes a problem
with patient management. Patients with a deeper stage of the disease cannot receive timely