Available online at www.sciencedirect.com
Medical Engineering & Physics 30 (2008) 419–425
Development of a (silent) speech recognition system
for patients following laryngectomy
M.J. Fagan
a,∗
, S.R. Ell
b
, J.M. Gilbert
a
, E. Sarrazin
a
, P.M. Chapman
c
a
Department of Engineering, University of Hull, UK
b
Department of Otolaryngology, Hull Royal Infirmary, Hull and East Yorkshire Hospitals NHS Trust, UK
c
Department of Computer Science, University of Hull, UK
Received 20 November 2006; received in revised form 2 May 2007; accepted 3 May 2007
Abstract
Surgical voice restoration post-laryngectomy has a number of limitations and drawbacks. The present gold standard involves the use of a
tracheo-oesophageal fistula (TOF) valve to divert air from the lungs into the throat, which vibrates, and from this, speech can be formed. Not
all patients can use these valves and those who do are susceptible to complications associated with valve failure. Thus there is still a place for
other voice restoration options.
With advances in electronic miniaturization and portable computing power a computing-intensive solution has been investigated. Magnets
were placed on the lips, teeth and tongue of a volunteer causing a change in the surrounding magnetic field when the individual mouthed
words. These changes were detected by 6 dual axis magnetic sensors, which were incorporated into a pair of special glasses. The resulting
signals were compared to training data recorded previously by means of a dynamic time warping algorithm using dynamic programming.
When compared to a small vocabulary database, the patterns were found to be recognised with an accuracy of 97% for words and 94% for
phonemes. On this basis we plan to develop a speech system for patients who have lost laryngeal function.
© 2007 IPEM. Published by Elsevier Ltd. All rights reserved.
Keywords: Speech recognition; Rehabilitation; Laryngectomy; Magnetic sensor; Speech system
1. Introduction
Patients with laryngeal cancer, whose larynx must be
removed, inevitably lose their voice. Also, as a result of
surgery, the viscera involved in swallowing and breathing
are separated so that the patient must breathe through their
neck via a permanent tracheostomy. The three main meth-
ods used currently to restore vocal function may encounter
a number of problems and limitations. Sound can be cre-
ated by swallowing air and belching, forming the sound into
words. This is known as ‘oesophageal speech’ and is difficult
to learn, and fluent speech is impossible. Vibrating the soft
tissues of the throat by an electrolarynx creates sound, which
∗
Corresponding author at: Centre for Medical Engineering and Technol-
ogy, Department of Engineering, University of Hull, Hull HU6 7RX, UK.
Tel.: +44 1482 465058; fax: +44 1482 466664.
E-mail address: m.j.fagan@hull.ac.uk (M.J. Fagan).
can be articulated into speech, but the voice is monotonic,
‘Dalek-like’, and can be difficult to understand. The current
‘gold-standard’ method is to use a small silicone tracheo-
oesophageal fistula speech valve that connects the trachea
and the oesophagus [1]. Air, powered by the lungs, is diverted
through the fistula into the throat which vibrates, and this is
formed into speech. However, although these valves work
very well initially, they rapidly become colonised by biofilm
in many patients and fail after an average of only 3–4 months
[2–5]. Various modifications have been tried over the years
to discourage biofilm growth (e.g. [6–8]), but to date none of
these approaches appears to provide a long-term solution to
this problem.
Thus there is a need for a fundamental improvement in the
current methods for the restoration of speech after laryngec-
tomy. Digital (voiced) speech recognition systems have been
the subject of research for a number of years, based on mea-
surement of sound emitted by the speaker [9] and a variety of
1350-4533/$ – see front matter © 2007 IPEM. Published by Elsevier Ltd. All rights reserved.
doi:10.1016/j.medengphy.2007.05.003