AUTOMATIC CLASSIFICATION OF ENVIRONMENTAL NOISE EVENTS BY HIDDEN MARKOV MODELS Paul Gaunard Corine Ginette Mubikangiey Christophe Couvreur Vincent Fontaine Facult´ e Polytechnique de Mons, 31, Boulevard Dolez, B-7000 Mons, BELGIUM Tel: ++ 32 65 374176 - Fax: ++32 65 374129 Email: couvreur,fontaine @tcts.fpms.ac.be ABSTRACT The automatic classification of environmental noise sources from their acoustic signatures recorded at the microphone of a noise mon- itoring system (NMS) is an active subject of research nowadays. This paper shows how hidden Markov models (HMM’s) can be used to build an environmental noise recognition system based on a time- frequency analysis of the noise signal. The performance of the pro- posed HMM-based approach is evaluated experimentally for the classification of five types of noise events (car, truck, moped, air- craft, train). The HMM-based approach is found to outperform pre- viously proposed classifiers based on the average spectrum of noise event with more than 95% of correct classifications. For compari- son, a classification test is performed with human listeners for the same data which shows that the best HMM-based classifier outper- forms the “average” human listener who achieves only 91.8% of correct classification for the same task. 1. INTRODUCTION The latest generation of noise monitoring systems (NMS’s) is ba- sed on digital signal processing technology. They commonly im- plement such features as computation and storage of noise levels ( ), one-third-octave spectra, statistical indices or the detection of noise events based on thresholds. Since the computational power of signal processors keeps increasing, it is likely that NMS’s will become capable of even more sophisticated treatments of the sound data they record. Consequently, research has been undertaken to develop new measurement features for inclusion in NMS’s. An area of research that has started to attract much attraction recently is au- tomatic noise recognition (ANR). The goal of an ANR system is the automatic —i.e., without human intervention—classification of the noise sources that are present in the acoustic environment from their recordings at the microphone of the NMS. One particular problem in ANR is the classification of noise events such as car or truck pass-bys, aircraft fly-overs, etc. The ANR systems that have been proposed for that task rely generally on two-step process: a pre-processor converts the acoustical signal of the noise event into a set of characteristic features which are then used by a classifier to make a decision on the nature of the source of the noise event. Until now, the pre-processors that have been pro- posed were based on a “static” approach. That is, the noise event was reduced to a global set of characteristics which is then used Christophe Couvreur is a Research Assistant of the Belgian National Fund for Scientific Research (F.N.R.S.). He is also currently a Visiting Scholar with the Coordinated Science Laboratory of the University of Illi- nois at Urbana-Champaign. to perform the classification. For instance, the average spectrum of the is a common choice. Various statistical pattern recognition techniques have been suggested for the realization of the classifier acting on that “static” representation. In this paper, a new method for the classification of noise events based on hidden Markov models (HMM’s), a technique that has been widely successful in automatic speech recognition [5, 3], is proposed. HMM-based classifiers use a “dynamic” recognition me- thod that takes directly into account the time-frequency structure of the noise events. As will be seen, the utilization of hidden Markov models can bring significant improvement over previously proposed methodologies for the automatic recognition of noise events. The remainder of this paper is organized as follows. In sec- tion 2, the choice of the pre-processor for an ANR system based on HMM’s is discussed. Application oh HMM’s to ANR is discussed in section 3. Experimental results obtained for the classification of five types of environmental noise events are presented in section 4 together with results of human listeners for the same task. Conclu- sions are drawn in section 5. 2. PRE-PROCESSING For the classifier to act directly on the time-frequency structure of the signal, the pre-processor must convert the raw acoustic signal sampled at the microphone into a time-frequency representation. Such time-frequency representation can be obtained by splitting the signal into (consecutive or possibly overlapping) short frames and compute a set of features characteristic of the spectrum for each frame. The output of the pre-processor will then be a series of spec- tral components , where is a set of fea- tures representative of the spectrum corresponding to the -th frame of signal. For example, if a one-third-octave filter bank is used and short-time ’s are computed in frequency bands, can be the -dimensional vector formed from the one-third-octave levels for the -th integration interval of the ’s. In this case, the frame length corresponds to the integration length for the ’s. Instead of using a filter bank, other types of spectral analysis can be used on the signal frames. In section 4, LPC (Linear Pre- diction Coding) cepstral analysis will be used [3]. Both the filter-bank method and LPC-cepstrum method of spec- tral analysis convert the original acoustic signal into a sequence of continuous-valued vectors IR . This sequence of continuous- valued vectors can be converted into a sequence of discrete sym- bols by a technique called vector quantization (VQ) [4]. VQ allows the utilization of discrete HMM’s.