2566 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 10, OCTOBER 1997 An Integrated Hybrid Neural Network and Hidden Markov Model Classiﬁer for Sonar Signals Amlan Kundu and George C. Chen Abstract—We present here an integrated hybrid hidden Markov model and neural network (HMM/NN) classiﬁer that combines the time normal- ization property of the HMM classiﬁer with the superior discriminative ability of the neural net (NN). In the proposed classiﬁer, a left-to- right HMM module is used ﬁrst to segment the observation sequence of every exemplar into a ﬁxed number of states. Subsequently, all the frames belonging to the same state are replaced by one average frame. Thus, every exemplar, irrespective of its time-scale variation, is transformed into a ﬁxed number of frames, i.e., a static pattern. The multilayer perceptron (MLP) neural net is then used as the classiﬁer for these time-normalized exemplars. Some experimental results using sonar biologic signals are presented to demonstrate the superiority of the hybrid integrated classiﬁer. Index Terms—HMM, modiﬁed Vitterbi algorithm, multilayer percep- tron, neural net, time normalization. I. INTRODUCTION In passive surveillance of naval objects, a number of different types of transient signals including biologics, e.g., sounds emitted by sea-bound animals such as whales, are observed. The transient classiﬁcation problem deals with the recognition of these transients as belonging to their respective classes. This problem is difﬁcult for a number of reasons. 1) Short duration of the transients makes the classical frequency analysis difﬁcult. 2) There are wide intraclass variations due to large variations in the structures and systems generating the transients. 3) The effects of ambient ocean noise and the presence of mer- chant ships lead to fuzzy class boundaries. The most common type of classiﬁer used for this task is the neural net [1], [2], although other classiﬁers have been studied [2]–[5]. In [5], it is demonstrated that by combining both HMM and NN classiﬁers in the decision-making stage, the false alarm rate can be reduced by “not classifying” signals that are deemed confusing, but this combination cannot increase the correct recognition rate. The present work, on the other hand, intends to build the foundation of one unique classiﬁer that would incorporate the theoretical and practical advantages of both HMM and NN classiﬁer in the classiﬁer itself, i.e., build one classiﬁer that would handle wide temporal variability and provide strong interclass discriminative power. This capability is achieved by cascading a HMM recognizer with a neural net recognizer where the HMM recognizer does the job of time normalization, and the neural net does the job of classiﬁcation. In [6], a similar hybrid HMM/NN classiﬁer is described using the learning vector quantizer (LVQ) as its neural net component. In our view, the multilayer perceptron neural net (MLP–NN) provides more discriminative power vis-` a-vis the LVQ as it uses a hidden layer of nodes to provide nonlinear hyper boundaries separating the classes in the decision space. We Manuscript received May 23, 1995; revised February 18, 1997. The associate editor coordinating the review of this paper and approving it for publication was Dr. Yingbo Hua. A. Kundu is with U.S. West Advanced Technologies, Boulder, CO 80303 USA. G. C. Chen is with the RDT and E Division, NCCOSC, San Diego, CA 92152-5000 USA. Publisher Item Identiﬁer S 1053-587X(97)07347-9. have also incorporated the modiﬁed Viterbi algorithm (MVA) in the HMM scheme to provide full-state segmentation (described later) of the signals. Although full-state segmentation is essential for this type of classiﬁer, this issue is not discussed in [6], although some ideas are presented in [7]. These ideas, in our opinion, are not as suitable, both theoretically and practically, as the use of the modiﬁed Viterbi algorithm. The details of the classiﬁcation algorithm and some experimental results are described next. II. FEATURE SELECTION For features, we have chosen the short-time FFT and the quadrature mirror ﬁlter (QMF) bank-based coefﬁcients based on past experiences of various researchers, and the intuition that both temporal and frequency resolutions (in the feature space) are needed for good classiﬁcation. A. Feature Based on QMF Bank We are motivated to use QMF bank-based [8] features because of their links to the wavelet transform. Mallat, in [9], has shown that if the resolution step of the wavelet transform is selected as 2, then the wavelet transform can be implemented by the QMF bank. The importance of a wavelet transform stems from the fact that the wavelet transform represents time-varying signals in the transform domain with a good time localization property. Since sonar transients represent time-varying acoustic events, it is only logical to seek feature representation of such signals with good frequency and time resolution. A QMF bank is a multirate digital ﬁlter bank and is composed of analysis banks (decimators) that are used to partition the signal into several consecutive frequency bands and synthesis banks (inter- polators) that are used to combine the partitioned signals back to the original signal without loss of information [8], i.e., no distortion due to aliasing, phase, and amplitude. A simple way to create a number of QMF subbands is to use tree structure decomposition of QMF. The design of a one-dimensional (1-D) four-band separable QMF bank can be performed by tree-structure decomposition of a 1-D two-band separable QMF. This process is continued until we have the desired number of bands [10]. After the signal is decomposed into subbands, the root mean square energy in each subband is computed. The two lowest fre- quency subbands are not used because the signals in these bands are dominated by sonobuoy vibration noise. Sonobuoys ﬂoat on top of the sea surface and are subject to vibration due to waves. The rms energy in the bands is arranged into a feature vector to represent the signal. In our preferred feature extraction scheme, we use a QMF bank composed of 32 subbands. The band splitting is accomplished using linear-phase FIR ﬁlters and , where , and a good choice for the impulse response of is as shown in the following table. 0.105 016 70e 02 0.505 452 60e 02 0.258 975 60e 02 0.276 414 00e 01 0.966 637 60e 02 0.903 922 30e 01 0.977 981 70e 01 0.481 028 040 1053–587X/97$10.00  1997 IEEE