IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 59, NO. 12, DECEMBER 2010 3237 A Deconvolutive Neural Network for Speech Classiﬁcation With Applications to Home Service Robot Donglin Wang, Student Member, IEEE, Henry Leung, Ajeesh P. Kurian, Hye-Jin Kim, and Hosub Yoon Abstract—Reverberation deteriorates the quality and intelligi- bility of speech, leading to the poor performance of classiﬁcation systems. Room reverberation parameters depend on the location of the speaker and the microphone and the room geometry. For mobile robots, the reverberation is constantly changing due to the relative movement of the speaker and the robot. This can affect the spectral properties of the signal and therefore, the classiﬁcation accuracy. The contribution of this paper is a new network archi- tecture, which uses neural network constant modulus algorithm (NNCMA) based equalizer followed by a multi-layer preceptron (MLP) classiﬁer. NNCMA is an MLP which is trained with a cost function similar to constant modulus algorithm (CMA). With this two-stage structure, the classiﬁer does not have to consider the time-varying nature of the reverberation. The proposed algorithm is applied to speech samples collected by the home service robot WEVER-R2 for speaker classiﬁcation in a typical home or ofﬁce environment. We use them for gender classiﬁcation application. The proposed neural network was found to have 83.73% of classi- ﬁcation accuracy for age classiﬁcation and 88.91% of classiﬁcation accuracy for gender classiﬁcation, while the standard MLP had a classiﬁcation accuracy of 71.43% and 72.29%, respectively. Index Terms—Blind deconvolution, constant modulus algo- rithm (CMA), multilayer perceptron, neural network, reverber- ation, robotics, speech classiﬁcation. I. I NTRODUCTION S ERVICE ROBOTS are becoming popular due to a cus- tomer demand for applications in household, security, health care, home network, and entertainment [1]–[3]. To make a home service robot perform well, human-robot interaction (HRI) is indispensable. It is essential for a service robot with HRI capabilities to perform speech source localization, source separation, and classiﬁcation of the separated sources [4]. Intelligibility of the speech collected in home and ofﬁce environments degrades due to room reverberations [5]. These Manuscript received April 3, 2009; revised December 29, 2009; accepted January 25, 2010. Date of publication September 7, 2010; date of current version November 10, 2010. This work was supported by the IT R&D program of MKE/IITA [2008-F-037-01, (Development of HRI Solutions and Core Chipsets for u-Robot)]. The Associate Editor coordinating the review process for this paper was Dr. John Sheppard. D. Wang, H. Leung, and A. P. Kurian are with the Department of Electrical and Computer Engineering, University of Calgary, Calgary, AB T2N 1N4, Canada (e-mail: dowang@ucalgary.ca; leungh@ucalgary.ca; ajeesh@ieee.org). H.-J. Kim and H. Yoon are with the Intelligent Robotics Research Division, Electronic and Telecommunications Research Institute (ETRI), Daejeon 305- 700, Korea (e-mail: marisan@etri.re.kr; yoonhs@etri.re.kr). Color versions of one or more of the ﬁgures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identiﬁer 10.1109/TIM.2010.2047551 reverberations and noise effects can signiﬁcantly degrade the performance of a speech recognizer and the speaker classiﬁca- tion systems [6]. When the speaker and the microphones are moving, the reverberation parameters are constantly changing with time, resulting in nonstationary signals at the microphones and adding further complications to speaker classiﬁcation. In the past, neural networks have been widely applied to speech classiﬁcation [7]–[16]. In [8], a hybrid neural network based on a hidden Markov model was developed for relatively complicated speech patterns. In [11], the authors used Mel- cepstral coefﬁcients as feature vectors to the multi-layer per- ceptron (MLP) network to perform a speaker-dependent speech classiﬁcation. A nonlinear segmentation technique was used to incorporate the temporal information of the speech and a Chinese speech recognition system was developed in [12]. In addition, hidden Markov-type neural networks [13]–[16] were also explored for robust speech classiﬁcation. However, these neural-networks-based speech recognizers mainly consider the noise problem and rarely address the channel reverberation effect. Hence, to improve the HRI capabilities of the home service robot, one should mitigate the effects of the channel. Recently, neural networks have been used for the deconvolution of signals which was affected by the multiple transmission path [18], [19]. However, these algorithms need to know the input to the channel to train the network. Since there are relative movements between the robot and the speaker, the channel is always changing. Moreover, there may be multiple speakers in the room and the room composition may change from time to time. To reduce the effect of these changes in room reverberation, a blind equalization scheme can be used prior to the classiﬁer. Blind equalization of channel is one of the common prob- lems in wireless communications. A constant modulus algo- rithm (CMA) is widely used for the equalization of wireless channels [17]. As its name implies, CMA assumes that the signal under investigation is constant modulus. However, for numerous nonconstant modulus signals such as the quadrature amplitude modulation, CMA was found to be effective [20]. In this paper, a neural network CMA (NNCMA) scheme is introduced prior to the conventional MLP-based classiﬁer, in order to reduce the channel effects (the reverberation of a room is normally represented with a ﬁnite impulse response ﬁlter) on classiﬁcation. We assume that, for a very short period (of the order of few milliseconds), the channel is time-invariant. In training, as well as in testing, the NNCMA scheme constantly 0018-9456/$26.00 © 2010 IEEE