Computers 2022, 11, 34. https://doi.org/10.3390/computers11030034 www.mdpi.com/journal/computers Article Multimodal LipReading for Tracheostomy Patients in the Greek Language Yorghos Voutos 1 , Georgios Drakopoulos 1 , Georgios Chrysovitsiotis 2 , Zoi Zachou 2 , Dimitris Kikidis 2 , Efthymios Kyrodimos 2 and Themis Exarchos 1, * 1 Department of Informatics, Ionian University, 49 100 Corfu, Greece; c16vout@ionio.gr (Y.V.); c16drak@ionio.gr (G.D.) 2 Voice Clinic, Medical School, National and Kapodistrian University of Athens, 115 27 Athens, Greece; chrysovi@gmail.com (G.C.); zoizachou@gmail.com (Z.Z.); dimitriskikidis@yahoo.com (D.K.); timkirodimos@hotmail.com (E.K.) * Correspondence: exarchos@ionio.gr; Tel.: +302661087855 Abstract: Voice loss constitutes a crucial disorder which is highly associated with social isolation. The use of multimodal information sources, such as, audiovisual information, is crucial since it can lead to the development of straightforward personalized word prediction models which can repro duce the patient’s original voice. In this work we designed a multimodal approach based on audio visual information from patients beforelossofvoice to develop a system for automated lipreading in the Greek language. Data preprocessing methods, such as, lipsegmentation and framelevel sampling techniques were used to enhance the quality of the imaging data. Audio information was incorporated in the model to automatically annotate sets of frames as words. Recurrent neural net works were trained on four different video recordings to develop a robust word prediction model. The model was able to correctly identify test words in different time frames with 95% accuracy. To our knowledge, this is the first word prediction model that is trained to recognize words from video recordings in the Greek language. Keywords: tracheostomy; lip reading; deep learning; multimodal interfaces 1. Introduction The human voice is a fundamental characteristic of human communication and ex pression. It is produced by an organ named larynx which tunes the vocal cords during exhalation. Each person has a unique and recognizable voice, the lossof which consti tutes a significant disorder. One of the biggestcauses of partial or complete voice loss is tracheostomies.Other causes of voice loss include neurological diseases as well as laryn geal and thyroid cancer. The increase in the age and the number of people with severe disabilities who must bear permanent tracheostomies have contributed to the increase of population experiencing difficulties in voice communication. In the U.S., the number ofpatients who undergo a tracheostomy is over100,000 per year[1]. The increase of hos pitalization rates in intensive care units and the number of patients at risk of speech loss has increased by 11% over the last 10 years [2] since 40% of the hospitalized patients are tracheostomy candidates. In Greece, the increased smokingrate(morethandoublethe European average) has led to an increase in total n u m b e r o f laryngectomies due to laryngeal cancer(95% in smokers [3]). In most voice loss cases, the situation does not improve over time and the available voice restoration methodsoften have poor results. As a result, individuals with partial or total voice loss are socially isolated; often exhibiting derivativedisorders such as depres sion and cognitive function impairmentwhich in turn lowers their quality of life. The cur rently available surgical solutions for partial speech restoration are often not satisfactory Citation: Voutos, Y.; Drakopoulos, G.; Chrysovitsiotis, G.; Zachou, Z.; Kikidis, D.; Kyrodimos, E.; Exarchos, T. Multimodal LipReading for Tracheostomy Patients in the Greek Language. Computers 2022, 11, 34. https://doi.org/10.3390/ computers11030034 Academic Editor: Fernando Bobillo Received: 13 January 2022 Accepted: 25 February 2022 Published: 28 February 2022 Publisher’s Note: MDPI stays neu tral with regard to jurisdictional claims in published maps and institu tional affiliations. Copyright: © 2022 by the authors. Li censee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and con ditions of the Creative Commons At tribution (CC BY) license (https://cre ativecommons.org/licenses/by/4.0/).