Research Article Self-Attention-Based Edge Computing Model for Synthesis Image to Text through Next-Generation AI Mechanism Hamdan Ali Alshehri , 1 N. Junath , 2 Poonam Panwar , 3 Kirti Shukla , 4 Saima Ahmed Rahin , 5 and R. John Martin 6 1 Faculty of Computer Science and Information Technology, Jazan University, Jizan, Saudi Arabia 2 Information Technology, University of Technology and Applied Science, Ibri, Oman 3 Chitkara University Institute of Engineering and Technology, Chitkara University, Chandigarh, Punjab, India 4 Galgotias University, Noida, India 5 United International University, Dhaka, Bangladesh 6 Faculty of Computer Science and Information Technology, Jazan University, Jizan, Saudi Arabia Correspondence should be addressed to Saima Ahmed Rahin; srahin213012@mscse.uiu.ac.bd Received 5 April 2022; Revised 8 May 2022; Accepted 14 May 2022; Published 10 June 2022 Academic Editor: Vijay Kumar Copyright © 2022 Hamdan Ali Alshehri et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Image synthesis based on natural language description has become a research hotspot in edge computing in artiﬁcial intelligence. With the help of generative adversarial edge computing networks, the ﬁeld has made great strides in high-resolution image synthesis. However, there are still some defects in the authenticity of synthetic single-target images. For example, there will be abnormal situations such as “multiple heads” and “multiple mouths” when synthesizing bird graphics. Aiming at such problems, a text generation single-target model SA-AttnGAN based on a self-attention mechanism is proposed. SA-AttnGAN (Attentional Generative Adversarial Network) reﬁnes text features into word features and sentence features to improve the semantic alignment of text and images; in the initialization stage of AttnGAN, the self-attention mechanism is used to improve the stability of the text- generated image model; the multistage GAN network is used to superimpose, ﬁnally synthesizing high-resolution images. Experimental data show that SA-AttnGAN outperforms other comparable models in terms of Inception Score and Frechet Inception Distance; synthetic image analysis shows that this model can learn background and colour information and correctly capture bird heads and mouths. e structural information of other components is improved, and the AttnGAN model generates incorrect images such as “multiple heads” and “multiple mouths.” Furthermore, SA-AttnGAN is successfully applied to de- scription-based clothing image synthesis with good generalization ability. 1. Introduction Image synthesis based on text description (text to image, t2i) covers technologies such as computer vision and natural language processing and is an interdisciplinary and cross- modal comprehensive task [1]. Based on the input natural language description, the model should synthesize images consistent with the description content and have complete semantic information. is task requires the computer to understand the semantic information of the text and convert the semantic information into pixels to generate a high- resolution and high-ﬁdelity image, which is a very challenging task. It has a wide range of application potential and can be used in computer-aided design, criminal in- vestigation portrait generation, etc. e rapid development of deep learning has brought signiﬁcant advances in computer vision and natural lan- guage processing in theory and technology and promoted the task of text-based image synthesis to move towards high resolution, high authenticity, and high controllability. Ref. [2] used generative adversarial networks (GANs) [3] to extract sentence features of textual descriptions by using a character-level recurrent neural network, along with noise as input to a cGAN network [3]. To reduce the diﬃculty of Hindawi Mathematical Problems in Engineering Volume 2022, Article ID 4973535, 12 pages https://doi.org/10.1155/2022/4973535