A Noise-Aware Methodology for a Mobile Voice Screening Application Laura Verde * , Giuseppe De Pietro * , Pierangelo Veltri † and Giovanna Sannino * * Institute of High Performance Computing and Networking (ICAR) National Research Council of Italy (CNR) Naples, Italy 80131 e-mail: {laura.verde, giuseppe.depietro, giovanna.sannino}@na.icar.cnr.it † University Magna Graecia of Catanzaro Catanzaro - Italy e-mail: veltri@unicz.it Abstract—Dysphonia is a qualitative and quantitative alter- ation of the voice due to a structural or functional modiﬁcation of one or more organs involved in voice production. Voice disorders are prevalent in certain working categories, particularly those of teachers, singers and actors. It is possible to evaluate the state of health of a voice through the acoustic analysis of the speech signal. This provides information about the presence of dysphonia by calculating speciﬁc parameters, such as the Fundamental Frequency (F0). In this paper we present a methodology to estimate F0 embedded in an m-health application, able to perform a simple and fast voice screening. The app acquires a user’s vocal signal, and then elaborates and analyses it, distinguishing between a pathological and a healthy voice. Unfortunately, during the signal acquisition a noise can alter the F0’s estimation, introducing possible errors in the acoustic analysis and therefore increasing the potential number of false-positive diagnoses of voice disorders. For this reason, the methodology presented is also able to reduce the incidence of any additional noise accidentally added during the user’s vocal signal acquisition. Keywords—Voice Disorders, Dysphonia, Mobile Voice Screen- ing, Fundamental Frequency Estimation, Noise Evaluation. I. I NTRODUCTION Voice disorders affect about 29% (lifetime prevalence) of the population [1]. Early signs of deterioration of the voice due to vocal malfunctioning are normally a breathiness and hoarseness of the voice produced. The medical term to indicate disorders of the voice is ”dysphonia”. Principal causes of these disorders are unhealthy lifestyles or vocal abuse. In fact, by reason of the extended and often intense use of their voice, the most affected category is professional voice users, such as singers, actors or teachers. Indeed, between 20% to 80% of teachers have reported that they have suffered from various voice disorders which have impacted on their professional and social life [2]. The expert, in this case the speech therapist, performs an evaluation of the patient’s voice through a laryngoscopy, an anamnestic evaluation, and a subjective self-assessment of the user’s voice, according to the SIFEL protocol [3] developed by the Italian Society of Logopedics and Phoniatrics, follow- ing the instructions of the Committee for Phoniatrics of the European Society of Laryngology. Additionally, an acoustic analysis of the speech signal is also performed, a useful and non-invasive procedure that constitutes a fast indicator of pos- sible voice problems. This technique consists of a quantitative evaluation of characteristic parameters calculated on recording of the vowel /a/ for a minimum of ﬁve seconds. One of the most important of these parameters is the Fundamental Frequency (F 0 ), that quantiﬁes the vibratory frequency of the vocal folds. However, we have to consider that speech signals are not exactly periodic, and in particular the F 0 is always changing and ﬂuctuating. This aperiodicity of the vocal fold vibration is an index of vocal pathologies. We need also to consider the presence of noise that can introduce the potential risk of false- positive diagnoses of vocal fold pathologies. Typical sources of interference are: • Background noise added to the speech signal, for example environmental noise or engine noise when talking on a mobile phone; • Acoustic or audio feedback, which occurs when the microphone in the mobile phone captures the actual speech of another person together with the useful voice signal. Due to numerous sources of interference inﬂuencing the speech signal it is necessary to reduce the contribution of additional noise so as to distort as little as possible the F 0 estimation. Noise that occurs on a wide-frequency band with a random distribution, like environmental noise, music, and engine noise, is much harder to separate and suppress from the signal, since it appears in the same frequency range as speech. In this work we present a mobile health (m-Health) ap- plication, that is able to discriminate healthy voices from pathological ones, providing a fast instrument useful for the prevention and monitoring of the state of health of the voice, using a smart phone or tablet. The methodology, implemented in a mobile application, is able to estimate the F 0 from the user’s recording of the vowel /a/, even in the presence of noise, obtaining good discrimination capabilities in terms of screening accuracy.