978-1-4673-8564-0/15/$31.00 ©2015 IEEE
Proc. 5th National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics 2015 (NCVPRIPG
2015), Patna, India, Dec. 16-19, 2015, paper ID 88.
1/4
Place of Articulation from Direct Imaging for
Validation of Its Estimation from Speech Analysis
for Use in Speech Training
K. S. Nataraj and Prem C. Pandey
Department of Electrical Engineering
Indian Institute of Technology Bombay
Mumbai 400076, India
Email: {natarajks, pcpandey} @ ee.iitb.ac.in
Abstract—Place of articulation obtained by analysis of the
speech signal is useful for visual feedback of articulatory efforts
for speech training of hearing impaired children and for
improving pronunciation by learners of second languages. Its
estimation by direct imaging of the oral cavity is needed for
validating the estimation from the speech signal. For such
applications, an automated technique is presented for estimating
the place of articulation by graphical processing of the upper and
lower contours of the oral cavity image. It iteratively estimates
the axial curve as an axis of symmetry of the oral cavity, such
that the curve approximately bisects the normals to it. Distance
between the contours along the normal to the axial curve gives
the oral cavity opening and position of the smallest opening
provides the place of articulation. The values estimated using the
automated technique closely matched those obtained by manual
marking of the visually estimated place of maximum constriction
for the oral cavity images of vowels, stops, and fricatives, from
the XRMB and MRI databases.
Keywords—axial curve; oral cavity opening; place of
articulation; speech training
I. INTRODUCTION
Hearing-impaired children have difficulty in acquiring
ability to control the articulators involved in speech
production, due to lack of auditory feedback. Speech training
aids can provide non-auditory feedback to help the process of
speech acquisition. A visual feedback of articulatory efforts
has been found to be useful in improving vowel articulation by
the hearing-impaired children [1], [2]. It can also help a
second-language learner in improving pronunciation [3]. Place
of articulation, i.e., the place of maximum constriction in the
oral cavity is the most significant information for speech
training. It can be estimated using imaging techniques,
acoustic measurements, and analysis of the speech signal.
However, only the estimation obtained by analysis of the
speech signal can be used for speech training. In the
commonly used methods, analysis of the speech signal is
based on linear predictive coding with the oral cavity modeled
as a lossless acoustic tube with plane wave propagation [2],
[4], [5]. Although the lower and upper contours of the oral
cavity have varying curvatures, the cavity is modeled as a tube
with equal-length segments of varying cross-section areas.
Place of articulation estimated by speech analysis needs to be
validated with reference to the value obtained from direct
imaging of the oral cavity during speech production.
Direct imaging provides upper and lower contours of the
oral cavity. Irregular shapes of the contours cause difficulties
in locating the maximum constriction and finding its distance
from the lips in a consistent manner. As a solution to this
problem, an automated technique involving graphical
processing of the contours is presented. The acoustic wave
propagation is assumed to be along an axial curve between the
two contours with the normal to the curve representing the
wave front. The axial curve is iteratively estimated as an axis
of symmetry, approximately bisecting the normals to it.
Segment of the normal to the axial curve between the two
contours provides an estimate of the oral cavity opening.
Values of oral cavity opening as a function of distance from
the lips are used for estimating the place of articulation.
The second section provides a review of the direct imaging
techniques and some of the earlier methods for estimation of
place of articulation. The proposed automated technique is
presented in the third section. The test results are presented in
the fourth section, followed by conclusion in the last section.
II. PLACE OF ARTICULATION BY DIRECT IMAGING
Commonly used direct imaging techniques to capture the
movement of the articulators in the mid-sagittal plane (side
view) during speech production are ultrasound imaging,
electropalatography (EPG), X-ray microbeam (XRMB),
electromagnetic articulography (EMA), and magnetic
resonance imaging (MRI) [6]–[11]. Speech production
databases have been developed using the last three techniques.
The XRMB technique [6] uses narrow X-ray beams for
recording articulatory movements in the mid-sagittal plane by
tracking the gold pellets glued to the articulators. It permits
simultaneous recording of speech signal. The database [7],
developed at the University of Wisconsin, provides
articulatory plots consisting of four pellet points (T1-T4) on
the tongue and one each on the upper lip (UL), lower lip (LL)
and incisor (MNi), at 160 frames/s and audio recordings for
vowels, vowel-consonant-vowel syllables, sentences, and
paragraphs, from 25 male and 22 female speakers.
The research is supported by the National Program on Perception
Engineering Phase II, sponsored by the Department of Electronics &
Information Technology, MCIT, Government of India.