SeTA: Semiautomatic Tool for Annotation of Eye Tracking
Images
Andoni
Larumbe-Bergera
Public University of
Navarre
Pamplona, Spain
andoni.larumbe@unavarra.es
Sonia Porta
Public University of
Navarre
Pamplona, Spain
sporta@unavarra.es
Rafael Cabeza
Public University of
Navarre
Pamplona, Spain
rcabeza@unavarra.es
Arantxa Villanueva
Public University of
Navarre
Pamplona, Spain
avilla@unavarra.es
ABSTRACT
Availability of large scale tagged datasets is a must in the feld of
deep learning applied to the eye tracking challenge. In this paper, the
potential of Supervised-Descent-Method (SDM) as a semiautomatic
labelling tool for eye tracking images is shown. The objective of
the paper is to evidence how the human efort needed for manually
labelling large eye tracking datasets can be radically reduced by the
use of cascaded regressors. Diferent applications are provided in
the felds of high and low resolution systems. An iris/pupil center
labelling is shown as example for low resolution images while a
pupil contour points detection is demonstrated in high resolution. In
both cases manual annotation requirements are drastically reduced.
CCS CONCEPTS
· Applied computing → Annotation; · Computing method-
ologies → Supervised learning by regression.
KEYWORDS
image annotation, eye tracking, Supervised-Descent-Method
ACM Reference Format:
Andoni Larumbe-Bergera, Sonia Porta, Rafael Cabeza, and Arantxa Vil-
lanueva. 2019. SeTA: Semiautomatic Tool for Annotation of Eye Tracking
Images. In 2019 Symposium on Eye Tracking Research and Applications (ETRA
’19), June 25–28, 2019, Denver , CO, USA. ACM, New York, NY, USA, 5 pages.
https://doi.org/10.1145/3314111.3319830
1 INTRODUCTION
Typically, the problem of estimating gaze has been divided into two
issues, namely, eye tracking and gaze estimation. Eye tracking is
related to the algorithms focused on processing the acquired eye
image to obtain image features e.g. iris or pupil center, glints, eye-
lids, etc. while gaze estimation covers the challenge of fnding gaze
from the image. As in many other computer vision problems deep
learning techniques can be used to solve both problems as demon-
strated in some of the works published in the last few years [Krafka
et al. 2016] [Zhang et al. 2018] [Park et al. 2018].
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
ETRA ’19, June 25–28, 2019, Denver , CO, USA
© 2019 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-6709-7/19/06.
https://doi.org/10.1145/3314111.3319830
One of the basic requirements for any deep learning procedure
is the availability of large scale labelled datasets to be used during
the training stage. The procedure to obtain these datasets is not
trivial, moreover the possibility of having the images labelled is
not completely solved yet. In the case of gaze estimation methods,
the labelling procedure consists in tagging each one of the images
with gaze data, e.g. 2D PoR value or 3D LoS. This operation can
be carried out by employing previously agreed gaze points or gaze
directions according to the case. Basically, the user is asked to gaze
known grids of points. It is assumed that the subject gazes the
corresponding point for a dwell time. In this manner, the images
are tagged automatically provided that a synchronization proce-
dure is established between the displayed points and the image
recording thread. In the case of eye tracking methods, labelling is
a broader problem since the required marks vary depending on
the algorithm, ranging from pupil or iris center to iris contour, eye
corners or eyelids among others. Moreover, the labelling procedure
is not straightforward. The most obvious but tedious way to solve
the labelling problem is to carry out a manual marking procedure
for which dedicated tools can be designed [Fuhl et al. 2017]. Con-
sidering the required size of the datasets for deep learning this is
not a practical solution. In fact, there are companies devoted to
image tagging tasks being this a business showed up as result of the
higher demand of labelled datasets to be employed in deep/machine
learning felds. In this manner, human efort is translated into dol-
lars. More practical proposals based on using synthesized images
are found in the bibliography [Sugano et al. 2014]. One of them
is data augmentation, properly defned as the process of increas-
ing the number of data/images of datasets by means, generally,
of artifcial techniques. This can involve changes such as image
rotation, introducing lighting variations in the images, varying the
degree of noise conditions, etc., generating diferent sub-samples
from the same original image. Moreover simulators can be em-
ployed in which camera, user, gazed points and light sources are
simulated. Thus, the image is artifcially generated and the labels
corresponding to image features are known by construction. Ex-
amples devoted to high and low resolution eye tracking can be
found in the literature [Świrski and Dodgson 2014] [Wood et al.
2016]. Finally, employing image processing techniques to partially
automatize the annotation process has also been proposed in the
bibliography [Tonsen et al. 2016].
In this paper a Semiautomatic Tool for Annotation of Eye Track-
ing images, SeTA, based on Supervised-Descent-Method (SDM) [Xiong
and la Torre 2014] is proposed. SDM can be used as feature detec-
tion algorithm based on a training stage and has demonstrated