64 Scientifc Journal of Astana IT University ISSN (P): 2707-9031 ISSN (E): 2707-904X VOLUME 13, MARCH 2023 Copyright © 2023, Authors. This is an open access article under the Creative Commons CC BY license Nam Diana Master of Tech. Sci., PhD Student di_nam@kbtu.kz, orcid.org/0000-0002-9356-3114 School of Information Technology and Engineering, Kazakh British Technical University, Almaty, Kazakhstan Pak Alexandr Alexandrovich Candidate of tech. sciences, Professor a.pak@kbtu.kz, orcid.org/0000-0002-8685-9355 School of Information Technology and Engineering, Kazakh British Technical University, Almaty, Kazakhstan DOI: 10.37943/13BKBF2003 OVERVIEW OF TRANSFORMER-BASED MODELS FOR MEDICAL IMAGE SEGMENTATION Abstract: Premedical diagnostics is the process of examining survey results. Correct premedical diagnostics can improve the process of patient management and reduce the burden on the medical sector. Diagnostics of medical images such as computed tomography and X-ray are an obligatory step for further treatment. However, the shortage of clinicians causes delays in this step. We observed two state-of-the-art algorithms proposed for medical image segmentation: TransUnet and Swin-Unet. We conducted a theoretical comparison of algorithms in terms of the applicability of pre-hospital diagnostics according to quality and speed of training. The comparison is based on the original source of code provided by the authors of the original articles. We chose these two algorithms because they have similar U-form architecture, a high level of citation, and show competitive DICE scores on pictures of various human organs. Some architectural features were also important. Both models inherit key elements of U-net. TransUnet is a hybrid Transformer and CNN model. It consists of Transformer encoder and a convolutional decoder. Some additional computations are required in the bottleneck. Swin-Unet is a fully Transformer-based model. These architectural differences give rise to a difference in the number of trainable parameters. Generally, deeper architectures with bigger number of parameters usually show better performance, however, according to our review, Swin-Unet has smaller number of parameters and shows better DICE and Hausdorff Distance. It should be noted that the distribution between false positive and false negative predictions is important in medical image processing. It is crucial to avoid overloading the medical sector while also not missing any sick patients. Precision and recall can be used to evaluate the ratio of incorrect predictions. Therefore, we also observed the results of caries segmentation where precision and DICE were provided. In this specifc case, TransUnet shows better DICE and recall values but worse precision. Keywords: Computer Vision, Transformers, Image processing, premedical diagnostics, Segmentation Introduction In the modern world, there is a big problem with a lack of clinicians. It causes a problem with patient management. Patients with a deeper stage of the disease cannot receive timely