Automatic Speech Correction: A step to Speech Recognition for People with Disabilities Naim TERBEH 1 , Mohamed LABIDI 2 , Mounir ZRIGUI 3 Research Laboratory of Technologies of Information and Communication & Electrical Engineering LaTICE (Monastir unit) Faculty of Science of Monastir, computer science department, Monastir 5000, Tunisia 1 terbehnaim1987@gmail.com 3 mounir.zrigui@fsm.rnu.tn 2 labidi8mohamed@gmail.com Abstract— This work consists on achieve an automatic correction system for Arabic continuous speech. This system will be combined by an ASR system for disabled people. For this work, we built a lexicon of 4.000.000 Arabic words through which decides if a word is correct or not. A corpus of Arabic texts is also required to provide a standard summarizing the appearance rate of each two-letter (two-phoneme) in the Arabic language. The results of our system were encouraging and present an advantage to other work for people with articulatory disabilities. Keywords— Automatic Speech Correction, Automatic Speech Recognition, Arabic language, Wrong pronunciation. I. INTRODUCTION Automatic speech correction profited by computer revolution caused by the appearance of the means to human- machine communication. Today, the automatic speech correction is in very fertile ground caused by diversity of human-computer interaction applications. In this paper, we will attempt to show our visualization to introduce the automatic speech correction at the end to improve the recognition rate of the ASR for people with articulatory problems. II. CONTEXT OF WORK Automatic speech correction is an area of research that spreads gradually in the francophone and Anglophone community, but to our knowledge, almost untouched for the Arabic. Thanks to this technology the human- machine communication has become more efficient and profitable by introducing in machine the opportunities to correct errors due to the wrong pronunciation of speakers. Several statistics show the existence of a large number of disabled people in pronunciation, that articulatory problems prevent proper pronunciation understandable. People with disabilities are not immune to human communication. In addition, number of disabled people is increasing. For this, we try to correct as possible false pronunciations that prevent easy and immediate understanding of Arabic dialogue. Therefore, the need to introduce a tool for automatic Arabic speech correction. This work takes place in the research Laboratory of Technology of Information and Communication and Electrical Engineering (LATICE, Monastir unit, Tunisia). This article is part of the automatic processing of Arabic spoken in order to improve its understanding. III. ARABIC LANGUAGE Arabic is the language spoken by the original Arabs. It is a Semitic language (like Akkadian and Hebrew). Arabic has 445 million speakers to be ranked the fourth in number of speakers, ranked 8th in the number of pages that run on Internet [1,2].with its morphological and syntactic properties, the Arabic language is considered as difficult to learner in the area of language automatic treatment [3,4]. A wrong pronunciation increases the difficulty of dealing with the Arabic speech. This difficulty present the advantage of introducing means to facilitate understanding of spoken Arabic, including the correction of wrong pronunciation. IV. STANDARD ARABIC To achieve our correction Arabic word system, a text-based must be is constructed and undergo certain treatments: • Deleting special characters, • Deleting punctuation, • Deleting numbers • Any geminate letter will be doubled. From the basis of texts obtained after pretreatment procedure we extract necessary information for the correction algorithm. This information is probabilities of occurrence of each two-letter in the Arabic corpus. The letter located at the end of word and the beginning of the next word will not be considered as two-letter. The arrangement of these probabilities in 841 coefficients vector (841=29 2 : Arabic letters are 29), form a standard for the Arabic language. The result will be in the following form: