Handling of true and pseudo wideband speech signals in automatic speech recognition Mohamed-Ali Ben Salah 1 , Jean Monné 1 , Denis Jouvet 1 & Régine André-Obrecht 2 1 Orange Labs, 2 avenue Pierre Marzin, 22300 Lannion, France 2 IRIT- Université Paul Sabatier, 118 route de Narbonne, 31062 Toulouse Cedex 9 Abstract: - It is well known that automatic speech recognition systems that process wideband speech perform better than those processing narrowband one. Nevertheless, in order to take advantage from the wideband benefit, the problem of coexistence of true and pseudo wideband speech signals must be taken into account, where pseudo wideband signals are referred as data sampled at 16 kHz for a bandwidth of 0-4000 Hz only. In this paper, a series of speaker-independent, continuous speech phoneme recognition experiments have been carried out using the BREF80 and ESTER French corpus to quantify the problem of coexistence. An approach for solving it is proposed and it is based on the identification of the type of the signal. Then, once the signal type is identified, the system responds with the appropriate speech recognition model. The signal type identification uses a new acoustic description of the 16 kHz sampled signals and reaches a high identification rate. Key-Words: - Wideband, narrowband, pseudo wideband, speech recognition, Aurora Advanced Front-end 1 Introduction Because of network emergence, automatic speech recognition (ASR) systems are nowadays processing speech signal sampled at different rates and with different bandwidths. In one hand, it is well known that ASR systems that process narrowband (NB) speech; typically sampled at 8 kHz with a bandwidth of 0-4000 Hz only, perform worse than those that process wideband (WB) speech; typically sampled at 16 kHz with a bandwidth of 0-8000 Hz [10]. On the other hand, pseudo wideband (PWB) speech i.e. sampled at 16 kHz with a bandwidth of 0-4000 Hz only; represents a major source of degradation for system performance as demonstrated in [1]. In fact, due to the architecture network implementation, no PWB signalization is provided, therefore this largely affects the system performance as the recognition platform has not any mean for knowing the type of the processed data and therefore may not respond in an optimal way when PWB speech is presented. As demonstrated in [1], even with a system adapted to the PWB speech, system performance is not as good as for narrowband speech processed by a narrowband model. As a result, the deployment of the automatic speech recognition system when a mutual use of WB and PWB signals is possible must take into account the coexistence of the true and pseudo wideband speech signals in order to take advantage from the WB benefits and to guarantee the best response of the system in all cases. In this paper the problem of coexistence of true and pseudo wideband speech for automatic speech recognition task is studied. Firstly, the contribution that the wideband signal brings to the automatic speech recognition is measured on the BREF80 [5] and ESTER [4] French corpus. Secondly, the problem of the pseudo wideband speech; where data presented to the automatic speech recognition system as WB data are in reality obtained just by an over-sampling of 8 kHz signal to 16 kHz; is addressed. A new approach is proposed based on the identification of the signal type. Once the type of the signal is identified, the system responds with the appropriate speech recognition model. This leads to the best response of the ASR system for data sampled at 16 kHz, whether true or false wideband signals. The remainder of this paper is organized as follows. In section 2, the framework is briefly described through a presentation of the databases and of the speech models. In section 3, baseline results are presented and discussed. In section 4, the proposed solution for the pseudo wideband data problem is detailed. Finally the conclusions are presented in section 5. 2 Framework 2.1 Data bases Experiments were carried out on two French corpus: BREF80 and ESTER. BREF80 was designed to provide continuous speech data for the evaluation of continuous speech recognition systems, and for the study of phonological variations [5]. The spoken texts have been selected from the French newspaper "Le Monde". The aim of the ESTER evaluation campaign was to evaluate 8th WSEAS International Conference on SIGNAL, SPEECH and IMAGE PROCESSING (SSIP '08) Santander, Cantabria, Spain, September 23-25, 2008 ISSN: 1790-5109 39 ISBN: 978-960-6474-008-6