Biomedical Signal Processing and Control 34 (2017) 25–35 Contents lists available at ScienceDirect Biomedical Signal Processing and Control jo ur nal homepage: www.elsevier.com/locate/bspc Synthesizing the motion of the vocal folds using optical flow based techniques Gustavo Andrade-Miranda a, , Nathalie Henrich Bernardoni b,c , Juan I. Godino-Llorente a a Center for Biomedical Technology, Univ. Politécnica de Madrid, Campus de Montegancedo, Crta. M40 km, 38, 28223 Madrid, Spain b Univ. Grenoble Alpes, GIPSA-Lab, F-38000 Grenoble, France c CNRS, GIPSA-Lab, F-38000 Grenoble, France a r t i c l e i n f o Article history: Received 14 October 2016 Received in revised form 1 December 2016 Accepted 6 January 2017 Keywords: Glottal dynamics High-speed videoendoscopy Motion field Optical flow Facilitative playbacks Motion synthesis a b s t r a c t Different playbacks have been proposed to synthesize the dynamical information of the vocal folds. However most of them rely on the delineation of the glottal gap using segmentation techniques which is a complex task and usually requires a manual supervision. In order to solve this issue, three new playbacks based on the optical flow computation are presented. Two of them, called Optical Flow Glottovibrogram and Glottal Optical Flow Waveform, analyze the global dynamics; and the remaining one, called Optical Flow Kymogram, analyzes the local dynamics. The reliability of the proposed playbacks is evaluated by comparison with traditional representations, showing a great correlation in shape with the traditional playbacks, and allowing the identification of the most important instants of time, such as closed-states and maximal opening. In addition, they provide complementary information to the common spatio-temporal representations, although the new playbacks are lightly blurred. © 2017 Elsevier Ltd. All rights reserved. 1. Introduction The high-speed videoendoscopy (HSV) has revolutionized laryngeal imaging, allowing us to better understand the glottal dynamics during the phonation process. The HSV technique is capa- ble to acquire the true intra-cycle vibratory behavior which permit the study of cycle-to-cycle glottal variations. HSV let characterize laryngeal tissue dynamics and vocal folds vibratory features, which are not possible to assess (visualize) using common videoendo- scopic and stroboscopic techniques [3–5]. HSV records thousands of frames per second, which makes impossible the manual analysis of such amount of information. Therefore, it is needed the use of image processing techniques to synthesize the time-varying data into a few static images. The literature reports some proposals to represent in a more simple way the HSV information. These representations improve the quantification accuracy, facilitate the visual perception, and A preliminary version of this work has been reported in INTERSPEECH 2015 and MAVEBA 2015 [1,2]. Corresponding author. E-mail addresses: gxandrade@ics.upm.es (G. Andrade-Miranda), Nathalie.Henrich@gipsa-lab.fr (N. Henrich Bernardoni), ignacio.godino@upm.es (J.I. Godino-Llorente). increase the reliability of visual rating while preserving the most relevant characteristics of glottal vibratory patterns. These rep- resentations are known as facilitative playbacks [6]. The most widespread and successful playbacks used either by clinicians or researchers are: Digital Kymograms (DKG) [7], Mucosal Wave Kymogram (MKG) [6], Glottal Area Waveform (GAW) [8], Phonovi- brogram (PVG) [9], and Glottovibrogram (GVG) [10]. Depending on the way they assess glottal dynamics, they can be grouped in local- or global-dynamics playbacks. Local-dynamics playbacks analyze the vocal folds behavior along one single line that is computed on a line perpendicular to the main glottal axis. DKG is the most extended method in this category and it has been successfully applied to demonstrate the change of glottal dynamics in case of damaged tissues, such as lesions, scars, discoloration of the vocal folds and voice disorders [11,12]. On the other hand, global-dynamics playbacks analyze the vocal folds behavior along the whole glottal length, being GAW, PVG and GVG the most wide-spread methods. These three playbacks are focused on vocal folds edge motion by means of glottal segmen- tation algorithms. For instance, GAW uses the glottal segmentation to compute a glottal area function along time from which several parameters can be estimated [13]. Contrariwise PVG and GVG play- backs are 2D representations of vocal folds vibratory patterns as a function of time, for which glottal-edge movements along the anterior–posterior axis are summarized into a time-varying image http://dx.doi.org/10.1016/j.bspc.2017.01.002 1746-8094/© 2017 Elsevier Ltd. All rights reserved.