71
Deep Learning for Arabic Error Detection and Correction
MANAR ALKHATIB, British University in Dubai
AZZA ABDEL MONEM, Ain Shams University
KHALED SHAALAN, British University in Dubai
Research on tools for automating the proofreading of Arabic text has received much attention in recent years.
There is an increasing demand for applications that can detect and correct Arabic spelling and grammatical
errors to improve the quality of Arabic text content and application input. Our review of previous studies
indicates that few Arabic spell-checking research eforts appropriately address the detection and correction
of ill-formed words that do not conform to the Arabic morphology system. Even fewer systems address the
detection and correction of erroneous well-formed Arabic words that are either contextually or semantically
inconsistent within the text. We introduce an approach that investigates employing deep neural network
technology for error detection in Arabic text. We have developed a systematic framework for spelling and
grammar error detection, as well as correction at the word level, based on a bidirectional long short-term
memory mechanism and word embedding, in which a polynomial network classifer is at the top of the sys-
tem. To get conclusive results, we have developed the most signifcant gold standard annotated corpus to
date, containing 15 million fully infected Arabic words. The data were collected from diverse text sources
and genres, in which every erroneous and ill-formed word has been annotated, validated, and manually re-
vised by Arabic specialists. This valuable asset is available for the Arabic natural language processing research
community. The experimental results confrm that our proposed system signifcantly outperforms the per-
formance of Microsoft Word 2013 and Open Ofce Ayaspell 3.4, which have been used in the literature for
evaluating similar research.
CCS Concepts: • Computing methodologies → Machine learning; Machine learning approaches;
Neural networks;
Additional Key Words and Phrases: Error detection, error correction, bidirectional long short-term memory,
word embedding, polynomial network classifer
ACM Reference format:
Manar Alkhatib, Azza Abdel Monem, and Khaled Shaalan. 2020. Deep Learning for Arabic Error Detection
and Correction. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 19, 5, Article 71 (August 2020), 13 pages.
https://doi.org/10.1145/3373266
1 INTRODUCTION
The Arabic language is strongly structured and is classifed as one of the most highly infected and
derivational languages. Arabic has various morphological features that make the tasks of error
diagnosis and feedback challenging [1, 2].
Author’s addresses: M. Alkhatib and K. Shaalan, Faculty of Engineering & IT, The British University in Dubai, P.O Box
345015, Dubai, UAE; emails: Manaralkhatib09@gmail.com, Khaled.Shaalan@buid.ac.ae; A. A. Monem, Faculty of Comput-
ers and Information, Ain Shams University, P.O Box 11566, Cairo, Egypt; email: azza_monem@hotmail.com.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and
the full citation on the frst page. Copyrights for components of this work owned by others than the author(s) must be
honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specifc permission and/or a fee. Request permissions from permissions@acm.org.
© 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.
2375-4699/2020/08-ART71 $15.00
https://doi.org/10.1145/3373266
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 19, No. 5, Article 71. Publication date: August 2020.