1 ARABIC CONTINUES SPEECH RECOGNITION SYSTEM USING CONTEXT-INDEPENDENT Ridha EJBALI Yassine BENAYED Mohamed Adel ALIMI REGIM REGIM REGIM ridha_ejbali@yahoo.fr yassine.benayed@gmail.com adel.alimi@ieee.org Abstract During this work, we carried out a research on a speech recognition system based on Hidden Markov Models (HMM) and phonemes models to follow its behaviour and estimate best performances in several scenarios. We based our study on coefficients composing Markov Models by varying iteration number of estimation, Gaussian number specifying acoustic units of the corpus by shifting its number and coefficients type modelling vectors characteristic of these units such as Mel frequency cepstral coefficients (MFCC) and Perceptual Linear Prediction (PLP). This system is based on Arabic language as training and recognition dialect; it uses phoneme as linguistic unit of this language. Keywords: Speech recognition, Hidden Markov models, corpus, MFCC, PLP, Arabic language, characteristics vectors, linguistic unit, phoneme. 1. INTRODUCTION Language is the primary means of human communication; this tool allows people to communicate their thoughts, feelings and desires [7]. Seen its importance, this means of communication have became a research subject to integrate it into human-machine interfaces [3]. To reach this stage, human-machine communi- cation, several areas were discussed to build a speech recognition system. As we have already indicated, our objective is to implement and to follow speech recognition system based on Arabic language. To achieve this, we have begun by studying Arabic language and have chosen a corpus that meets our needs in terms of phonetic balance then we are crossed to estimate corpus performance on a speech recognition system. 2. ARABIC LANGUAGE Arabic language is a Semitic language compo- sed of 29 fundamental letters (28 if we exclude the hamza (ء), which behaves as either separate letter or as diacritic). Among this letters, 26 ( ,غ ,ع ,ظ ,ط ,ض ,ص ,ش ,س ,ز ,ر ,ذ ,د ,خ ,ح ,ج ,ب,ء ,ن ,م ,ل ,ك ,ق ,ف, ث ,ت) are consonants and 2 (ي ,و) are either consonants or long vowels according to their appearance context in the words ( the letter "in ةدر و/wardatun/ Flower behaves as a consonant and in ةود/ دdoudatun/ worm behaves as a long vowel). Arabic lan- guage is composed of 40 phonemes described as follows: 28 consonants, 6 simple vowels, 6 emphatic vowels. In this list, we add the phoneme silence (s#) to characterize the absence of the signal. For the construction of a well balanced corpus, we based ourselves on the works described in [4]. 3. SYSTEM DESCRIPTION 3.1. General structure In the present work, we chose to be interested in the realization of Arabic speech recognition system. To begin, we selected to use probabilistic approach. A Markov models dictionary is built by proceeding with a Markov modelling of each entity. We picked out