Research Article
Self-Attention-Based Edge Computing Model for Synthesis
Image to Text through Next-Generation AI Mechanism
Hamdan Ali Alshehri ,
1
N. Junath ,
2
Poonam Panwar ,
3
Kirti Shukla ,
4
Saima Ahmed Rahin ,
5
and R. John Martin
6
1
Faculty of Computer Science and Information Technology, Jazan University, Jizan, Saudi Arabia
2
Information Technology, University of Technology and Applied Science, Ibri, Oman
3
Chitkara University Institute of Engineering and Technology, Chitkara University, Chandigarh, Punjab, India
4
Galgotias University, Noida, India
5
United International University, Dhaka, Bangladesh
6
Faculty of Computer Science and Information Technology, Jazan University, Jizan, Saudi Arabia
Correspondence should be addressed to Saima Ahmed Rahin; srahin213012@mscse.uiu.ac.bd
Received 5 April 2022; Revised 8 May 2022; Accepted 14 May 2022; Published 10 June 2022
Academic Editor: Vijay Kumar
Copyright © 2022 Hamdan Ali Alshehri et al. is is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
Image synthesis based on natural language description has become a research hotspot in edge computing in artificial intelligence.
With the help of generative adversarial edge computing networks, the field has made great strides in high-resolution image
synthesis. However, there are still some defects in the authenticity of synthetic single-target images. For example, there will be
abnormal situations such as “multiple heads” and “multiple mouths” when synthesizing bird graphics. Aiming at such problems, a
text generation single-target model SA-AttnGAN based on a self-attention mechanism is proposed. SA-AttnGAN (Attentional
Generative Adversarial Network) refines text features into word features and sentence features to improve the semantic alignment
of text and images; in the initialization stage of AttnGAN, the self-attention mechanism is used to improve the stability of the text-
generated image model; the multistage GAN network is used to superimpose, finally synthesizing high-resolution images.
Experimental data show that SA-AttnGAN outperforms other comparable models in terms of Inception Score and Frechet
Inception Distance; synthetic image analysis shows that this model can learn background and colour information and correctly
capture bird heads and mouths. e structural information of other components is improved, and the AttnGAN model generates
incorrect images such as “multiple heads” and “multiple mouths.” Furthermore, SA-AttnGAN is successfully applied to de-
scription-based clothing image synthesis with good generalization ability.
1. Introduction
Image synthesis based on text description (text to image, t2i)
covers technologies such as computer vision and natural
language processing and is an interdisciplinary and cross-
modal comprehensive task [1]. Based on the input natural
language description, the model should synthesize images
consistent with the description content and have complete
semantic information. is task requires the computer to
understand the semantic information of the text and convert
the semantic information into pixels to generate a high-
resolution and high-fidelity image, which is a very
challenging task. It has a wide range of application potential
and can be used in computer-aided design, criminal in-
vestigation portrait generation, etc.
e rapid development of deep learning has brought
significant advances in computer vision and natural lan-
guage processing in theory and technology and promoted
the task of text-based image synthesis to move towards high
resolution, high authenticity, and high controllability. Ref.
[2] used generative adversarial networks (GANs) [3] to
extract sentence features of textual descriptions by using a
character-level recurrent neural network, along with noise as
input to a cGAN network [3]. To reduce the difficulty of
Hindawi
Mathematical Problems in Engineering
Volume 2022, Article ID 4973535, 12 pages
https://doi.org/10.1155/2022/4973535