The Impact of the Speaker’s State on Speech Recognition Neziha JAOUADI, Ridha EJBALI, Mourad ZAIED and Chokri BEN AMAR Higher Institute of Computer and Multimedia of Gabes, Erriadh City campus, 6075 Zrig - Gabes, Tunisia Email: neziha_jaouedi@yahoo.fr REsearch Groups on Intelligent Machines, University of Sfax, Soukra street, B.P. 1173, 3038, Sfax, Tunisia Emails: {ridha ejbali, mourad.zaied, chokri.benamar}@ieee.org Abstract—In order to improve the speech recognition system, we have proposed the study of the speaker’s state in the pronunciation of a word, as an example. Believing that "a speaker can never utter a word in the same way twice” we have decided to study the acoustic space to identify changes in frequency of different discourses. Our goal is to add information about the state of a speaker to improve decision making in the phase of recognition. During this phase, we have based our work on Beta wavelet network as technique to model acoustic unit of speech. Keywords-speech Recognition; speaker situation; acoustic space; Beta wavelet network. I. INTRODUCTION The word is a means of communication between people. Regarding its importance, this medium was integrated into human-machine interfaces. The speech unfolds into multiple contents. Among them, the delivered linguistic content of the message is primordial. Yet, the paralinguistic information such as identity and mood of the speaker also play a crucial role in oral communication. Indeed, a voice message delivered by a speaker in two states for example (screams and whispers) are not the same recognition rates. RABINER, L. et B.HJUANG [1] found that human language is composed of a finite amount of phonetic units, and that these units can be identified through some visible properties of the signal or its spectrum [2] led to compare the overall shape of the word to be recognized with all the words constituting the reference vocabulary. LE VIET BAC [3] found that the speech can be characterized by a random process whose parameters can be estimated in appropriate manner. In this paper, we have proposed, firstly, a new approach to modeling the acoustic unit of speech based on the Beta wavelet networks. Secondly, we have shown that the addition of the speaker’s state improves the efficiency in terms of recognition rate. II. SPEECH RECOGNITION BY BETA WAVELET NETWORK To recognize a speech signal we have passed through few stages: feature extraction process, training process and recognition process. In order to improve decision making in the recognition phase, we have proposed the study of the speaker’s state at the pronunciation of a word. This is why we have used two approaches. The first one named global approach and the second is named local approach. The global approach: The corpus of the test is applied on the whole training corpus. The local approach: we have added the state of the speaker as an additional parameter and the corpus is done on the part having the same condition of the test signal. A. Training During this phase we shall build a basis of wavelet networks, each of which is associated with a training signal. The wavelet network used for the realization of this application consists of three layers: - Input layer receives the ordered positions of the signal obtained after feature extraction phase. - Hidden layer whose activation function is composed of Beta wavelet, the number of wavelets in this layer is variable. - Output layer. The spread of values follows the passage of "feed-forward" algorithm [5][6]. The Basis of wavelet networks Figure 1 : Beta wavelet network for speech recognition