IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 59, NO. 12, DECEMBER 2010 3237 A Deconvolutive Neural Network for Speech Classification With Applications to Home Service Robot Donglin Wang, Student Member, IEEE, Henry Leung, Ajeesh P. Kurian, Hye-Jin Kim, and Hosub Yoon Abstract—Reverberation deteriorates the quality and intelligi- bility of speech, leading to the poor performance of classification systems. Room reverberation parameters depend on the location of the speaker and the microphone and the room geometry. For mobile robots, the reverberation is constantly changing due to the relative movement of the speaker and the robot. This can affect the spectral properties of the signal and therefore, the classification accuracy. The contribution of this paper is a new network archi- tecture, which uses neural network constant modulus algorithm (NNCMA) based equalizer followed by a multi-layer preceptron (MLP) classifier. NNCMA is an MLP which is trained with a cost function similar to constant modulus algorithm (CMA). With this two-stage structure, the classifier does not have to consider the time-varying nature of the reverberation. The proposed algorithm is applied to speech samples collected by the home service robot WEVER-R2 for speaker classification in a typical home or office environment. We use them for gender classification application. The proposed neural network was found to have 83.73% of classi- fication accuracy for age classification and 88.91% of classification accuracy for gender classification, while the standard MLP had a classification accuracy of 71.43% and 72.29%, respectively. Index Terms—Blind deconvolution, constant modulus algo- rithm (CMA), multilayer perceptron, neural network, reverber- ation, robotics, speech classification. I. I NTRODUCTION S ERVICE ROBOTS are becoming popular due to a cus- tomer demand for applications in household, security, health care, home network, and entertainment [1]–[3]. To make a home service robot perform well, human-robot interaction (HRI) is indispensable. It is essential for a service robot with HRI capabilities to perform speech source localization, source separation, and classification of the separated sources [4]. Intelligibility of the speech collected in home and office environments degrades due to room reverberations [5]. These Manuscript received April 3, 2009; revised December 29, 2009; accepted January 25, 2010. Date of publication September 7, 2010; date of current version November 10, 2010. This work was supported by the IT R&D program of MKE/IITA [2008-F-037-01, (Development of HRI Solutions and Core Chipsets for u-Robot)]. The Associate Editor coordinating the review process for this paper was Dr. John Sheppard. D. Wang, H. Leung, and A. P. Kurian are with the Department of Electrical and Computer Engineering, University of Calgary, Calgary, AB T2N 1N4, Canada (e-mail: dowang@ucalgary.ca; leungh@ucalgary.ca; ajeesh@ieee.org). H.-J. Kim and H. Yoon are with the Intelligent Robotics Research Division, Electronic and Telecommunications Research Institute (ETRI), Daejeon 305- 700, Korea (e-mail: marisan@etri.re.kr; yoonhs@etri.re.kr). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIM.2010.2047551 reverberations and noise effects can significantly degrade the performance of a speech recognizer and the speaker classifica- tion systems [6]. When the speaker and the microphones are moving, the reverberation parameters are constantly changing with time, resulting in nonstationary signals at the microphones and adding further complications to speaker classification. In the past, neural networks have been widely applied to speech classification [7]–[16]. In [8], a hybrid neural network based on a hidden Markov model was developed for relatively complicated speech patterns. In [11], the authors used Mel- cepstral coefficients as feature vectors to the multi-layer per- ceptron (MLP) network to perform a speaker-dependent speech classification. A nonlinear segmentation technique was used to incorporate the temporal information of the speech and a Chinese speech recognition system was developed in [12]. In addition, hidden Markov-type neural networks [13]–[16] were also explored for robust speech classification. However, these neural-networks-based speech recognizers mainly consider the noise problem and rarely address the channel reverberation effect. Hence, to improve the HRI capabilities of the home service robot, one should mitigate the effects of the channel. Recently, neural networks have been used for the deconvolution of signals which was affected by the multiple transmission path [18], [19]. However, these algorithms need to know the input to the channel to train the network. Since there are relative movements between the robot and the speaker, the channel is always changing. Moreover, there may be multiple speakers in the room and the room composition may change from time to time. To reduce the effect of these changes in room reverberation, a blind equalization scheme can be used prior to the classifier. Blind equalization of channel is one of the common prob- lems in wireless communications. A constant modulus algo- rithm (CMA) is widely used for the equalization of wireless channels [17]. As its name implies, CMA assumes that the signal under investigation is constant modulus. However, for numerous nonconstant modulus signals such as the quadrature amplitude modulation, CMA was found to be effective [20]. In this paper, a neural network CMA (NNCMA) scheme is introduced prior to the conventional MLP-based classifier, in order to reduce the channel effects (the reverberation of a room is normally represented with a finite impulse response filter) on classification. We assume that, for a very short period (of the order of few milliseconds), the channel is time-invariant. In training, as well as in testing, the NNCMA scheme constantly 0018-9456/$26.00 © 2010 IEEE