New Wavelet-Based Pitch Detection Method for
Human-Robot Voice Interface
T.H. Tran, Q.P. Ha, and G. Dissanayake
ARC Centre of Excellence for Autonomous Systems (CAS)
Faculty of Engineering
University of Technology, Sydney,
PO Box 123 Broadway NSW 2007 AUSTRALIA
E-mail: {ttran, quangha, gdissa}@eng.uts.edu.au
Abstract— Speech activated interface between human
and autonomous/semi-autonomous systems requires
accurate voice detection and recognition. In this area pitch
and end-point detection is of vital importance. This paper
presents a new method for pitch detection based on the
phase of the continuous wavelet transform. The advantage
of the proposed technique is that it can serve not only as an
accurate pitch detector, but also can offer an efficient
solution to the end-point detection problem. Experimental
results are provided for the detection of pitch periods and
end points in a neural-network based voice enabled
wheelchair system.
I. INTRODUCTION
Human-robot interface plays a very important role in
operations of autonomous/semi-autonomous systems that
are to interact with people. These interactions must
possess a setting that is easy to participate, interesting
and intuitive for ordinary users [1]. Verbal
communication is the most natural means of interacting
with machines. Human-robot voice communication
covers many speech research areas such as speech
recognition, speech synthesis, speech identification and
verification [1-3]. Human-robot voice enabled interface,
although still in its infancy, has some successful
applications in tour-guide robots [1,4].
On the other hand, for such semi-autonomous systems
as a voice-enabled wheelchair, the requirement on the
reliability and speaker identification becomes more
important. For the recognition of a speaker voice, it is
essential to extract those features that are invariant with
regard to the speaker while maintaining the uniqueness
in order to prevent an impostor. The periodicity of
voiced speech known as pitch is considered a key feature
that can be used to identify reliably the speaker [5]. A
pitch period is thus an important parameter [3,5,6] in
accurate voice detection and speaker identification.
Estimating pitch periods in speech processing is
difficult because pitch frequencies can vary from 60Hz
to 500Hz and the pitch period of the same person may
vary depending on the emotional state, accents, and other
perceptual variables of that person [7,8]. There are a few
methods available for pitch period estimation [3, 5-10].
Classical methods, based on the autocorrelation function,
average magnitude difference function, and spectrum,
are insensitive to non-stationary variations in pitch
periods over the segment length and hence unsuitable for
low pitched and high pitched speakers [9]. Recently,
methods based on the discrete wavelet transform have
been developed and shown to be suitable for a wide
range of people [6,8,10]. As commented in [11], these
methods do not perform well in determining the pitch
period under severe noise conditions, which is the case
of a wheelchair user whose speech utterance is quite
often in a background of noise. For voice control of such
systems as a wheelchair, there exists the need for an
accurate method for the estimation of the pitch period
and the location of speech end points as well.
In this paper, a new detection method is proposed
based on the phase of the continuous wavelet transform
(CWT). Firstly, the relationship between the CWT phase
and the pitch phase is established. An effective
algorithm for pitch detection is then developed making
use of the pitch period parameter. The algorithm is
applied to detect starting and ending points of
monosyllable words having continuous speech waves. A
neural network (NN) is used to learn for the recognition
of a number of monosyllable-word voice commands via
spectrogram parameters. Features extracted by the CWT
and pitch period are used to train the neural network.
The results are comparable to those using features
extracted by the short time Fourier transform (STFT).
The proposed pitch detection method, possessing a
reliable performance, is applied to the voice control of a
wheelchair.
II. PITCH DETECTION USING THE CWT PHASE
In speech processing, the pitch period is an important
parameter in many applications such as speech compress
coding, analysis and synthesis, speech segment and
automatic monosyllable-word speech recognition. In a
voice controlled wheelchair, the pitch period is used in
the end-point detector and as an extracted feature for NN
training and voice recognition.
The wavelet transform, developed as a branch of
applied mathematics in the late 1980’s, has become a
0-7803-8463-6/04/$20.00 ©2004 IEEE
Proceedings of 2004 IEEE/RSJ International Conference on
Intelligent Robots and Systems
September 28 - October 2, 2004, Sendai, Japan
527