SeTA: Semiautomatic Tool for Annotation of Eye Tracking Images Andoni Larumbe-Bergera Public University of Navarre Pamplona, Spain andoni.larumbe@unavarra.es Sonia Porta Public University of Navarre Pamplona, Spain sporta@unavarra.es Rafael Cabeza Public University of Navarre Pamplona, Spain rcabeza@unavarra.es Arantxa Villanueva Public University of Navarre Pamplona, Spain avilla@unavarra.es ABSTRACT Availability of large scale tagged datasets is a must in the feld of deep learning applied to the eye tracking challenge. In this paper, the potential of Supervised-Descent-Method (SDM) as a semiautomatic labelling tool for eye tracking images is shown. The objective of the paper is to evidence how the human efort needed for manually labelling large eye tracking datasets can be radically reduced by the use of cascaded regressors. Diferent applications are provided in the felds of high and low resolution systems. An iris/pupil center labelling is shown as example for low resolution images while a pupil contour points detection is demonstrated in high resolution. In both cases manual annotation requirements are drastically reduced. CCS CONCEPTS · Applied computing → Annotation; · Computing method- ologies → Supervised learning by regression. KEYWORDS image annotation, eye tracking, Supervised-Descent-Method ACM Reference Format: Andoni Larumbe-Bergera, Sonia Porta, Rafael Cabeza, and Arantxa Vil- lanueva. 2019. SeTA: Semiautomatic Tool for Annotation of Eye Tracking Images. In 2019 Symposium on Eye Tracking Research and Applications (ETRA ’19), June 25–28, 2019, Denver , CO, USA. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3314111.3319830 1 INTRODUCTION Typically, the problem of estimating gaze has been divided into two issues, namely, eye tracking and gaze estimation. Eye tracking is related to the algorithms focused on processing the acquired eye image to obtain image features e.g. iris or pupil center, glints, eye- lids, etc. while gaze estimation covers the challenge of fnding gaze from the image. As in many other computer vision problems deep learning techniques can be used to solve both problems as demon- strated in some of the works published in the last few years [Krafka et al. 2016] [Zhang et al. 2018] [Park et al. 2018]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). ETRA ’19, June 25–28, 2019, Denver , CO, USA © 2019 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-6709-7/19/06. https://doi.org/10.1145/3314111.3319830 One of the basic requirements for any deep learning procedure is the availability of large scale labelled datasets to be used during the training stage. The procedure to obtain these datasets is not trivial, moreover the possibility of having the images labelled is not completely solved yet. In the case of gaze estimation methods, the labelling procedure consists in tagging each one of the images with gaze data, e.g. 2D PoR value or 3D LoS. This operation can be carried out by employing previously agreed gaze points or gaze directions according to the case. Basically, the user is asked to gaze known grids of points. It is assumed that the subject gazes the corresponding point for a dwell time. In this manner, the images are tagged automatically provided that a synchronization proce- dure is established between the displayed points and the image recording thread. In the case of eye tracking methods, labelling is a broader problem since the required marks vary depending on the algorithm, ranging from pupil or iris center to iris contour, eye corners or eyelids among others. Moreover, the labelling procedure is not straightforward. The most obvious but tedious way to solve the labelling problem is to carry out a manual marking procedure for which dedicated tools can be designed [Fuhl et al. 2017]. Con- sidering the required size of the datasets for deep learning this is not a practical solution. In fact, there are companies devoted to image tagging tasks being this a business showed up as result of the higher demand of labelled datasets to be employed in deep/machine learning felds. In this manner, human efort is translated into dol- lars. More practical proposals based on using synthesized images are found in the bibliography [Sugano et al. 2014]. One of them is data augmentation, properly defned as the process of increas- ing the number of data/images of datasets by means, generally, of artifcial techniques. This can involve changes such as image rotation, introducing lighting variations in the images, varying the degree of noise conditions, etc., generating diferent sub-samples from the same original image. Moreover simulators can be em- ployed in which camera, user, gazed points and light sources are simulated. Thus, the image is artifcially generated and the labels corresponding to image features are known by construction. Ex- amples devoted to high and low resolution eye tracking can be found in the literature [Świrski and Dodgson 2014] [Wood et al. 2016]. Finally, employing image processing techniques to partially automatize the annotation process has also been proposed in the bibliography [Tonsen et al. 2016]. In this paper a Semiautomatic Tool for Annotation of Eye Track- ing images, SeTA, based on Supervised-Descent-Method (SDM) [Xiong and la Torre 2014] is proposed. SDM can be used as feature detec- tion algorithm based on a training stage and has demonstrated