Computers 2022, 11, 34. https://doi.org/10.3390/computers11030034 www.mdpi.com/journal/computers
Article
Multimodal Lip‐Reading for Tracheostomy Patients in the
Greek Language
Yorghos Voutos
1
, Georgios Drakopoulos
1
, Georgios Chrysovitsiotis
2
, Zoi Zachou
2
, Dimitris Kikidis
2
,
Efthymios Kyrodimos
2
and Themis Exarchos
1,
*
1
Department of Informatics, Ionian University, 49 100 Corfu, Greece; c16vout@ionio.gr (Y.V.);
c16drak@ionio.gr (G.D.)
2
Voice Clinic, Medical School, National and Kapodistrian University of Athens, 115 27 Athens, Greece;
chrysovi@gmail.com (G.C.); zoizachou@gmail.com (Z.Z.); dimitriskikidis@yahoo.com (D.K.);
timkirodimos@hotmail.com (E.K.)
* Correspondence: exarchos@ionio.gr; Tel.: +30‐ 2661087855
Abstract: Voice loss constitutes a crucial disorder which is highly associated with social isolation.
The use of multimodal information sources, such as, audiovisual information, is crucial since it can
lead to the development of straightforward personalized word prediction models which can repro‐
duce the patient’s original voice. In this work we designed a multimodal approach based on audio‐
visual information from patients beforeloss‐of‐voice to develop a system for automated lip‐reading
in the Greek language. Data pre‐processing methods, such as, lip‐segmentation and frame‐level
sampling techniques were used to enhance the quality of the imaging data. Audio information was
incorporated in the model to automatically annotate sets of frames as words. Recurrent neural net‐
works were trained on four different video recordings to develop a robust word prediction model.
The model was able to correctly identify test words in different time frames with 95% accuracy. To
our knowledge, this is the first word prediction model that is trained to recognize words from video
recordings in the Greek language.
Keywords: tracheostomy; lip reading; deep learning; multimodal interfaces
1. Introduction
The human voice is a fundamental characteristic of human communication and ex‐
pression. It is produced by an organ named larynx which tunes the vocal cords during
exhalation. Each person has a unique and recognizable voice, the lossof which consti‐
tutes a significant disorder. One of the biggestcauses of partial or complete voice loss is
tracheostomies.Other causes of voice loss include neurological diseases as well as laryn‐
geal and thyroid cancer. The increase in the age and the number of people with severe
disabilities who must bear permanent tracheostomies have contributed to the increase of
population experiencing difficulties in voice communication. In the U.S., the number
ofpatients who undergo a tracheostomy is over100,000 per year[1]. The increase of hos‐
pitalization rates in intensive care units and the number of patients at risk of speech loss
has increased by 11% over the last 10 years [2] since 40% of the hospitalized patients
are tracheostomy candidates. In Greece, the increased smokingrate(morethandoublethe
European average) has led to an increase in total n u m b e r o f laryngectomies due to
laryngeal cancer(95% in smokers [3]).
In most voice loss cases, the situation does not improve over time and the available
voice restoration methodsoften have poor results. As a result, individuals with partial or
total voice loss are socially isolated; often exhibiting derivativedisorders such as depres‐
sion and cognitive function impairmentwhich in turn lowers their quality of life. The cur‐
rently available surgical solutions for partial speech restoration are often not satisfactory
Citation: Voutos, Y.;
Drakopoulos, G.; Chrysovitsiotis, G.;
Zachou, Z.; Kikidis, D.;
Kyrodimos, E.; Exarchos, T.
Multimodal Lip‐Reading for
Tracheostomy Patients in the
Greek Language. Computers 2022,
11, 34. https://doi.org/10.3390/
computers11030034
Academic Editor: Fernando Bobillo
Received: 13 January 2022
Accepted: 25 February 2022
Published: 28 February 2022
Publisher’s Note: MDPI stays neu‐
tral with regard to jurisdictional
claims in published maps and institu‐
tional affiliations.
Copyright: © 2022 by the authors. Li‐
censee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and con‐
ditions of the Creative Commons At‐
tribution (CC BY) license (https://cre‐
ativecommons.org/licenses/by/4.0/).