IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 48, NO. 5, OCTOBER 1999 1005 Vehicle Sound Signature Recognition by Frequency Vector Principal Component Analysis Huadong Wu, Mel Siegel, and Pradeep Khosla, Fellow, IEEE Abstract—The sound of a working vehicle provides an impor- tant clue to the vehicle type. In this paper, we introduce the “eigenfaces method,” originally used in human face recognition, to model the sound frequency distribution features. We show that it can be a simple and reliable acoustic identification method if the training samples can be properly chosen and categorized. We treat the frequency spectrum in a 200 ms time interval (a “frame”) as a vector in a high-dimensional frequency feature space. In this space, we study the vector distribution for each kind of vehicle sound produced under similar working conditions. A collection of typical sound samples is used as the training data set. The mean vector and the most important principal component eigenvectors of the covariance matrix of the zero-mean-adjusted samples together characterize its sound signature. When a new zero-mean-adjusted sample is projected into the principal compo- nent eigenvector directions, a small residual vector indicates that the unknown vehicle sound can be well characterized in terms of the training data set. Index Terms—Acoustic identification, frequency analysis, pat- tern recognition, principal components, sound signature, vehicle sounds. I. INTRODUCTION A LMOST every moving vehicle makes some kind of noise; the noise can come from the vibrations of the running engine, bumping and friction of the vehicle tires with the ground, wind effects, etc. Vehicles of the same kind and working in similar conditions (“class”) will generate similar noises, or have some kind of noise signature. This noise pattern gives a clue for military reconnaissance or a surveillance mission robot to detect a vehicle and recognize its class. Our research goal is to characterize noise patterns and use them to recognize whether a new detected sound is from a vehicle of known type, and if so to classify its type. When travelling at different speeds, under different road conditions, or with different acceleration, a vehicle emits different noise patterns. These noises can be sampled or digitized and grouped in a series of time slices (frames); then if the spectrum changes with time, it can be described in the frequency domain as the change of frequency spectrum distribution over frames. Manuscript received November 18. This work was supported under DARPA contract F04701-97-C-0022. H. Wu and M. Siegel are with the Robotics Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh PA 15213 USA (e-mail: whd@cs.cmu.edu). P. Khosla is with the Institute for Complex Engineering Systems, Carnegie Mellon University, Pittsburgh PA 15213 USA (e-mail: pkk@cmu.edu). Publisher Item Identifier S 0018-9456(99)06680-2. If we consider a frame’s noise frequency spectrum, with components, as an -dimensional vector, then each frame can be considered as a point in this -dimensional frequency spec- trum space. Noises from the same kind of vehicle and recorded under similar conditions will not be randomly distributed; if the classes are properly defined, samples from the same class should span a convex subregion, and a new sample can be classified according to its location in the frequency spectrum feature space. To find the features in high dimensional space, we adopt and adapt the eigenfaces method used in the vision com- munity to recognize human faces. This method is known as the Karhunen–Loeve expansion in pattern recognition, and as factor or principal-component analysis in the statistical literature. II. SIGNAL PROCESSING Vehicle noise is a kind of stochastic signal. A stochastic signal is defined as a stationary signal if its stochastic features are time-invariant, otherwise it is called a nonstationary signal. A vehicle that is making some noise of interest may be idling, or moving toward or away from an observing point (where the recording microphone is set); meanwhile it may be accelerating or decelerating etc. Over an extended observing time, the signal will generally not be stationary. But usually the recording microphone is fixed, and the vehicle’s running conditions usually do not change very often if it is not moving; if it is moving, then a fairly short sound duration can be recorded. So a vehicle sound signal can be reasonably treated as stationary, or as segments of stationary signal. Besides the engine’s running conditions, another important effect that has to be considered, to treat the moving vehi- cle noise as a piecewise stationary signal, is the acoustic Doppler effect. The maximum Doppler effect occurs when the recording microphone is set in the vehicle path. Let be the Doppler frequency shift, be the original frequency, be vehicle travelling speed, and be sound propagation speed; then we have . If the vehicle is travelling at 50 km/h ( 30 mi/h) and the speed of sound is 343.4 m/s, the maximum Doppler effect will cause about 4.2% change at the frequency component . As the vehicle noise generally has a frequency spectrum with large low frequency components, and the recording microphone usually is set off road, the resulting Doppler shift, less than 5%, is not very conspicuous compared with the unpredictable changes in recording conditions. Experience shows that taking the sound as a stationary signal is reasonable. 0018–9456/99$10.00  1999 IEEE