2566 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 10, OCTOBER 1997 An Integrated Hybrid Neural Network and Hidden Markov Model Classifier for Sonar Signals Amlan Kundu and George C. Chen Abstract—We present here an integrated hybrid hidden Markov model and neural network (HMM/NN) classifier that combines the time normal- ization property of the HMM classifier with the superior discriminative ability of the neural net (NN). In the proposed classifier, a left-to- right HMM module is used first to segment the observation sequence of every exemplar into a fixed number of states. Subsequently, all the frames belonging to the same state are replaced by one average frame. Thus, every exemplar, irrespective of its time-scale variation, is transformed into a fixed number of frames, i.e., a static pattern. The multilayer perceptron (MLP) neural net is then used as the classifier for these time-normalized exemplars. Some experimental results using sonar biologic signals are presented to demonstrate the superiority of the hybrid integrated classifier. Index Terms—HMM, modified Vitterbi algorithm, multilayer percep- tron, neural net, time normalization. I. INTRODUCTION In passive surveillance of naval objects, a number of different types of transient signals including biologics, e.g., sounds emitted by sea-bound animals such as whales, are observed. The transient classification problem deals with the recognition of these transients as belonging to their respective classes. This problem is difficult for a number of reasons. 1) Short duration of the transients makes the classical frequency analysis difficult. 2) There are wide intraclass variations due to large variations in the structures and systems generating the transients. 3) The effects of ambient ocean noise and the presence of mer- chant ships lead to fuzzy class boundaries. The most common type of classifier used for this task is the neural net [1], [2], although other classifiers have been studied [2]–[5]. In [5], it is demonstrated that by combining both HMM and NN classifiers in the decision-making stage, the false alarm rate can be reduced by “not classifying” signals that are deemed confusing, but this combination cannot increase the correct recognition rate. The present work, on the other hand, intends to build the foundation of one unique classifier that would incorporate the theoretical and practical advantages of both HMM and NN classifier in the classifier itself, i.e., build one classifier that would handle wide temporal variability and provide strong interclass discriminative power. This capability is achieved by cascading a HMM recognizer with a neural net recognizer where the HMM recognizer does the job of time normalization, and the neural net does the job of classification. In [6], a similar hybrid HMM/NN classifier is described using the learning vector quantizer (LVQ) as its neural net component. In our view, the multilayer perceptron neural net (MLP–NN) provides more discriminative power vis-` a-vis the LVQ as it uses a hidden layer of nodes to provide nonlinear hyper boundaries separating the classes in the decision space. We Manuscript received May 23, 1995; revised February 18, 1997. The associate editor coordinating the review of this paper and approving it for publication was Dr. Yingbo Hua. A. Kundu is with U.S. West Advanced Technologies, Boulder, CO 80303 USA. G. C. Chen is with the RDT and E Division, NCCOSC, San Diego, CA 92152-5000 USA. Publisher Item Identifier S 1053-587X(97)07347-9. have also incorporated the modified Viterbi algorithm (MVA) in the HMM scheme to provide full-state segmentation (described later) of the signals. Although full-state segmentation is essential for this type of classifier, this issue is not discussed in [6], although some ideas are presented in [7]. These ideas, in our opinion, are not as suitable, both theoretically and practically, as the use of the modified Viterbi algorithm. The details of the classification algorithm and some experimental results are described next. II. FEATURE SELECTION For features, we have chosen the short-time FFT and the quadrature mirror filter (QMF) bank-based coefficients based on past experiences of various researchers, and the intuition that both temporal and frequency resolutions (in the feature space) are needed for good classification. A. Feature Based on QMF Bank We are motivated to use QMF bank-based [8] features because of their links to the wavelet transform. Mallat, in [9], has shown that if the resolution step of the wavelet transform is selected as 2, then the wavelet transform can be implemented by the QMF bank. The importance of a wavelet transform stems from the fact that the wavelet transform represents time-varying signals in the transform domain with a good time localization property. Since sonar transients represent time-varying acoustic events, it is only logical to seek feature representation of such signals with good frequency and time resolution. A QMF bank is a multirate digital filter bank and is composed of analysis banks (decimators) that are used to partition the signal into several consecutive frequency bands and synthesis banks (inter- polators) that are used to combine the partitioned signals back to the original signal without loss of information [8], i.e., no distortion due to aliasing, phase, and amplitude. A simple way to create a number of QMF subbands is to use tree structure decomposition of QMF. The design of a one-dimensional (1-D) four-band separable QMF bank can be performed by tree-structure decomposition of a 1-D two-band separable QMF. This process is continued until we have the desired number of bands [10]. After the signal is decomposed into subbands, the root mean square energy in each subband is computed. The two lowest fre- quency subbands are not used because the signals in these bands are dominated by sonobuoy vibration noise. Sonobuoys float on top of the sea surface and are subject to vibration due to waves. The rms energy in the bands is arranged into a feature vector to represent the signal. In our preferred feature extraction scheme, we use a QMF bank composed of 32 subbands. The band splitting is accomplished using linear-phase FIR filters and , where , and a good choice for the impulse response of is as shown in the following table. 0.105 016 70e 02 0.505 452 60e 02 0.258 975 60e 02 0.276 414 00e 01 0.966 637 60e 02 0.903 922 30e 01 0.977 981 70e 01 0.481 028 040 1053–587X/97$10.00 1997 IEEE