(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 13, No. 5, 2022 Transformer-based Models for Arabic Online Handwriting Recognition Fakhraddin Alwajih 1,2 * , Eman Badr 1,3 , and Sherif Abdou 1 Department of Information Technology, Cairo University, Giza, Egypt 1 Department of Computer Science and Information Technology, Ibb University, Ibb, Yemen 2 University of Science and Technology, Zewail City of Science, Technology and Innovation, Giza, Egypt 3 Abstract—Transformer neural networks have increasingly be- come the neural network design of choice, having recently been shown to outperform state-of-the-art end-to-end (E2E) recurrent neural networks (RNNs). Transformers utilize a self-attention mechanism to relate input frames and extract more expressive sequence representations. Transformers also provide parallelism computation and the ability to capture long dependencies in contexts over RNNs. This work introduces a transformer-based model for the online handwriting recognition (OnHWR) task. As the transformer follows encoder-decoder architecture, we investigated the self-attention encoder (SAE) with two different decoders: a self-attention decoder (SAD) and a connectionist temporal classiﬁcation (CTC) decoder. The proposed models can recognize complete sentences without the need to integrate with external language modules. We tested our proposed mod- els against two Arabic online handwriting datasets: Online- KHATT and CHAW. On evaluation, SAE-SAD architecture per- formed better than SAE-CTC architecture. The SAE-SAD model achieved a 5% character error rate (CER) and an 18%word error rate (WER) against the CHAW dataset, and a 22% CER and a 56% WER against the Online-KHATT dataset. The SAE-SAD model showed signiﬁcant improvements over existing models of the Arabic OnHWR. Keywords—Selft attention; Transformer; deep Learning; con- nectionist temporal classiﬁcation; convolutional neural networks; Arabic online handwriting recognition I. I NTRODUCTION OnHWR is essentially a task of converting digital input handwriting into digital text. Handwriting recognition can be classiﬁed into two main categories based upon input data: on- line and ofﬂine handwriting recognition. In online handwriting, data is represented as a series of points with the precision of other information, such as timestamps, dependent upon the capabilities of the input device. In ofﬂine handwriting recog- nition, data is represented as images scanned from documents. In recent years, OnHWR has attained increased importance concomitant with rapid developments in related hardware and software. Most current communication software supports notetaking and writing on boards using online handwriting as both a communication media and a vehicle of computer-aided education. In the rising markets, greater access to computing devices has allowed ever-increasing populations to connect across the internet, with many depending solely on mobile devices with touchscreens. Handheld devices with styluses are becoming more widely available and used in many domains. * Corresponding authors. Concomitantly, there have been tremendous advances in prime technologies of deep learning and natural language processing (NLP) algorithms. Such advances have led, in turn, to consid- erable progress in the ﬁeld of OnHWR. The Arabic language is spoken by around half a billion people around the world. A number of other languages, including, Urdu, Persian, Kurdish, and Pashto adopted and use Arabic script. Arabic is a ’right to left’ language in its written form. It consists of 28 letters, 10 digits as well as a number of punctuation marks. Each Arabic letter has four contextual forms, depending upon its position in a word: isolated, beginning, middle, and end position forms, as shown in Fig. 1. Arabic OnHWR is a challenging problem for multiple reasons. One reason is the existence of a wide range of variations in handwriting styles, in part due to the existence of multiple calligraphies in Arabic. There are eight basic calligraphies in Arabic script [1]. The tendency is to use a combination of these calligraphies when writing in Arabic. This further compounds the variations in styles of writing, thus adding to the challenges that would face the developer of an Arabic script recognition system. Compared to Latin and Chinese and other scripts, published work in the Arabic OnHWR ﬁeld has to date been fairly limited. OnHWR is a sequence-to-sequence (S2S) classiﬁcation task. Input frames are ingested into the S2S model which in turn generates text. Recent advances in S2S models have shown their reliability solve complex NLP tasks such as translation [2] and automatic speech recognition (ASR) [3]. Additionally, the performance of OnHWR systems has im- proved with the advent of deep learning models including convolutional neural network (CNN) [4] and long short-term memory (LSTM) [5], [6]. Recently, E2E OnHWR systems have achieved remarkable performance, with input handwriting features being mapped directly to an output sequence of letters or tokens. In E2E systems, all components are trained and optimized jointly, thus reducing the complexity of the system and minimizing error propagation between components compared to conven- tional hybrid systems. Using CTC, E2E modeling has been utilized for handwriting recognition tasks as well as attention- based encoder-decoder systems designed for mathematical expression recognition tasks [7], [8]. Moreover, E2E has been incorporated with external language models (LM), effectively boosting performance [5]. In general, the competitive perfor- mance obtained by E2E models and their simplicity facilitate the building of state-of-the-art OnHWR systems. In this work, we explore building an E2E OnHWR system based on self- www.ijacsa.thesai.org 898 | Page