On The Use of Decision Trees for Arabic Pronunciation Assessment Khaled Necibi University of Annaba, Algeria LabGED Laboratory Computer Science Department BP. 12, Annaba, Algeria khaled.necibi@univ-annaba.org Hamza Frihia University of Annaba, Algeria LabGED Laboratory Computer Science Department BP. 12, Annaba, Algeria frihiahamza@yahoo.fr Halima Bahi University of Annaba, Algeria LabGED Laboratory Computer Science Department BP. 12, Annaba, Algeria bahi@labged.net ABSTRACT In the context of Computer Assisted Pronunciation Teaching (CAPT) and especially for the pronunciation evaluation, an Arabic speech recognizer is built and used to provide us with machine scores which will be used to assess the pronunciation of Arabic young learners. Most of the times, empirical thresholds are set to accept or reject the pronunciation. In this paper, we investigate the possibility of using decision trees as a tool to set automatically these thresholds. The aim of this study is to be able to separate between Algerian young pupils who may have disabilities in pronunciation from those who have normal pronunciation. Because having serious pronunciation difficulties can affect the whole educational career of pupils, our aim is to provide them with a tool based on speech recognition technology that can diagnosis different pronunciation problems. Categories and Subject Descriptors K.3.2 [Computer Education]: Computer and Information Science Education, Self Assessment Keywords CALL, CAPT, Arabic Pronunciation Assessment, Decision Trees, HMM 1. INTRODUCTION Needs in software for Computer Assisted Pronunciation teaching (CAPT) grow rapidly, whether it is used as assistant to teach in class or as tool of self-directed learning. With integration of Automatic Speech Recognition (ASR) techniques, the CAPTs systems became more and more successful. So, the computer can understand what the learner pronounces and reacts consequently, that leads to real time learning process by supplying feedbacks on the quality of the pronunciation. Advances of CAPT systems can also be used in the measurement of proficiency of candidates in reading tests. In pronunciation assessment context, the system, first, needs to “know” what has been said. Thus, the realization of a competitive CAPT system requires the use of a powerful automatic speech recognizer. Usually, speech recognizers are based on Hidden Markov Models (HMMs). Then, based on the speech recognizer outputs, the evaluation of the pronunciation may begin. The pronunciation scoring process may be summarized as suggested [1] in three main steps: - The generation of a phonetic segmentation, using an HMM- based speech recognizer. - The creation of machine pronunciation scores for the different phonetic segments by comparing the speech of the student to that of native speakers. - The calibration of the scores, which includes tuning the machine scores and possibly combining several of them. The goal is to develop scores that match as closely as possible the judgment of expert human listeners. To achieve this, it is necessary to collect training data that include pronunciation ratings by expert human raters. The next Table 1 illustrates some CAPTs systems that exists in the literature. Models of words to be pronounced are built using the Hidden Markov Models (HMM) technology and we assume in this system the use of the likelihood probability computation to assess pronunciation at word level. Once the incoming pronunciation is compared to the existing models of the word (to be pronounced), we suggest the use of a decision tree to decide whether the pronunciation is accepted or not. When the pronunciation is not accepted, this means that the learner may have some problems regarding letters articulations. The feedback returned by the system will help him to enhance his pronunciation skills. This work aims to provide Algerian young pupils with a computer assisted pronunciation teaching tool to learn Arabic Standard pronunciation and particularly to know whether their pronunciation is “correct” or “incorrect” as well as to be able to separate between pupils who have difficulties in pronunciation from those who have normal pronunciation. Although the native language of Algerian young pupils is dialect Arabic, Standard Arabic remains a difficult language for them with difficult sounds to master and letters which are similar in their written forms and Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org.. IPAC '15, November 23 - 25, 2015, Batna, Algeria. © 2015 ACM. ISBN 978-1-4503-3458-7/15/11…$15.00 DOI: http://dx.doi.org/10.1145/2816839.2816866