71 Deep Learning for Arabic Error Detection and Correction MANAR ALKHATIB, British University in Dubai AZZA ABDEL MONEM, Ain Shams University KHALED SHAALAN, British University in Dubai Research on tools for automating the proofreading of Arabic text has received much attention in recent years. There is an increasing demand for applications that can detect and correct Arabic spelling and grammatical errors to improve the quality of Arabic text content and application input. Our review of previous studies indicates that few Arabic spell-checking research eforts appropriately address the detection and correction of ill-formed words that do not conform to the Arabic morphology system. Even fewer systems address the detection and correction of erroneous well-formed Arabic words that are either contextually or semantically inconsistent within the text. We introduce an approach that investigates employing deep neural network technology for error detection in Arabic text. We have developed a systematic framework for spelling and grammar error detection, as well as correction at the word level, based on a bidirectional long short-term memory mechanism and word embedding, in which a polynomial network classifer is at the top of the sys- tem. To get conclusive results, we have developed the most signifcant gold standard annotated corpus to date, containing 15 million fully infected Arabic words. The data were collected from diverse text sources and genres, in which every erroneous and ill-formed word has been annotated, validated, and manually re- vised by Arabic specialists. This valuable asset is available for the Arabic natural language processing research community. The experimental results confrm that our proposed system signifcantly outperforms the per- formance of Microsoft Word 2013 and Open Ofce Ayaspell 3.4, which have been used in the literature for evaluating similar research. CCS Concepts: • Computing methodologies → Machine learning; Machine learning approaches; Neural networks; Additional Key Words and Phrases: Error detection, error correction, bidirectional long short-term memory, word embedding, polynomial network classifer ACM Reference format: Manar Alkhatib, Azza Abdel Monem, and Khaled Shaalan. 2020. Deep Learning for Arabic Error Detection and Correction. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 19, 5, Article 71 (August 2020), 13 pages. https://doi.org/10.1145/3373266 1 INTRODUCTION The Arabic language is strongly structured and is classifed as one of the most highly infected and derivational languages. Arabic has various morphological features that make the tasks of error diagnosis and feedback challenging [1, 2]. Author’s addresses: M. Alkhatib and K. Shaalan, Faculty of Engineering & IT, The British University in Dubai, P.O Box 345015, Dubai, UAE; emails: Manaralkhatib09@gmail.com, Khaled.Shaalan@buid.ac.ae; A. A. Monem, Faculty of Comput- ers and Information, Ain Shams University, P.O Box 11566, Cairo, Egypt; email: azza_monem@hotmail.com. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specifc permission and/or a fee. Request permissions from permissions@acm.org. © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. 2375-4699/2020/08-ART71 $15.00 https://doi.org/10.1145/3373266 ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 19, No. 5, Article 71. Publication date: August 2020.