IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 59, NO. 12, DECEMBER 2010 3237
A Deconvolutive Neural Network for Speech
Classification With Applications
to Home Service Robot
Donglin Wang, Student Member, IEEE, Henry Leung, Ajeesh P. Kurian, Hye-Jin Kim, and Hosub Yoon
Abstract—Reverberation deteriorates the quality and intelligi-
bility of speech, leading to the poor performance of classification
systems. Room reverberation parameters depend on the location
of the speaker and the microphone and the room geometry. For
mobile robots, the reverberation is constantly changing due to the
relative movement of the speaker and the robot. This can affect the
spectral properties of the signal and therefore, the classification
accuracy. The contribution of this paper is a new network archi-
tecture, which uses neural network constant modulus algorithm
(NNCMA) based equalizer followed by a multi-layer preceptron
(MLP) classifier. NNCMA is an MLP which is trained with a cost
function similar to constant modulus algorithm (CMA). With this
two-stage structure, the classifier does not have to consider the
time-varying nature of the reverberation. The proposed algorithm
is applied to speech samples collected by the home service robot
WEVER-R2 for speaker classification in a typical home or office
environment. We use them for gender classification application.
The proposed neural network was found to have 83.73% of classi-
fication accuracy for age classification and 88.91% of classification
accuracy for gender classification, while the standard MLP had a
classification accuracy of 71.43% and 72.29%, respectively.
Index Terms—Blind deconvolution, constant modulus algo-
rithm (CMA), multilayer perceptron, neural network, reverber-
ation, robotics, speech classification.
I. I NTRODUCTION
S
ERVICE ROBOTS are becoming popular due to a cus-
tomer demand for applications in household, security,
health care, home network, and entertainment [1]–[3]. To make
a home service robot perform well, human-robot interaction
(HRI) is indispensable. It is essential for a service robot
with HRI capabilities to perform speech source localization,
source separation, and classification of the separated sources
[4]. Intelligibility of the speech collected in home and office
environments degrades due to room reverberations [5]. These
Manuscript received April 3, 2009; revised December 29, 2009; accepted
January 25, 2010. Date of publication September 7, 2010; date of current
version November 10, 2010. This work was supported by the IT R&D program
of MKE/IITA [2008-F-037-01, (Development of HRI Solutions and Core
Chipsets for u-Robot)]. The Associate Editor coordinating the review process
for this paper was Dr. John Sheppard.
D. Wang, H. Leung, and A. P. Kurian are with the Department of Electrical
and Computer Engineering, University of Calgary, Calgary, AB T2N 1N4,
Canada (e-mail: dowang@ucalgary.ca; leungh@ucalgary.ca; ajeesh@ieee.org).
H.-J. Kim and H. Yoon are with the Intelligent Robotics Research Division,
Electronic and Telecommunications Research Institute (ETRI), Daejeon 305-
700, Korea (e-mail: marisan@etri.re.kr; yoonhs@etri.re.kr).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIM.2010.2047551
reverberations and noise effects can significantly degrade the
performance of a speech recognizer and the speaker classifica-
tion systems [6]. When the speaker and the microphones are
moving, the reverberation parameters are constantly changing
with time, resulting in nonstationary signals at the microphones
and adding further complications to speaker classification.
In the past, neural networks have been widely applied to
speech classification [7]–[16]. In [8], a hybrid neural network
based on a hidden Markov model was developed for relatively
complicated speech patterns. In [11], the authors used Mel-
cepstral coefficients as feature vectors to the multi-layer per-
ceptron (MLP) network to perform a speaker-dependent speech
classification. A nonlinear segmentation technique was used
to incorporate the temporal information of the speech and a
Chinese speech recognition system was developed in [12]. In
addition, hidden Markov-type neural networks [13]–[16] were
also explored for robust speech classification. However, these
neural-networks-based speech recognizers mainly consider the
noise problem and rarely address the channel reverberation
effect. Hence, to improve the HRI capabilities of the home
service robot, one should mitigate the effects of the channel.
Recently, neural networks have been used for the deconvolution
of signals which was affected by the multiple transmission path
[18], [19]. However, these algorithms need to know the input
to the channel to train the network. Since there are relative
movements between the robot and the speaker, the channel is
always changing. Moreover, there may be multiple speakers
in the room and the room composition may change from
time to time. To reduce the effect of these changes in room
reverberation, a blind equalization scheme can be used prior
to the classifier.
Blind equalization of channel is one of the common prob-
lems in wireless communications. A constant modulus algo-
rithm (CMA) is widely used for the equalization of wireless
channels [17]. As its name implies, CMA assumes that the
signal under investigation is constant modulus. However, for
numerous nonconstant modulus signals such as the quadrature
amplitude modulation, CMA was found to be effective [20].
In this paper, a neural network CMA (NNCMA) scheme is
introduced prior to the conventional MLP-based classifier, in
order to reduce the channel effects (the reverberation of a room
is normally represented with a finite impulse response filter)
on classification. We assume that, for a very short period (of
the order of few milliseconds), the channel is time-invariant. In
training, as well as in testing, the NNCMA scheme constantly
0018-9456/$26.00 © 2010 IEEE