1051-8215 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2595322, IEEE Transactions on Circuits and Systems for Video Technology IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, 2016 1 A Regression-based User Calibration Framework for Real-time Gaze Estimation Nuri Murat Arar, Student Member, IEEE, Hua Gao and Jean-Philippe Thiran, Senior Member, IEEE Abstract—Eye movements play a very significant role in human computer interaction (HCI) as they are natural and fast, and contain important cues for human cognitive state and visual attention. Over the last two decades, many techniques have been proposed to accurately estimate the gaze. Among these, video-based remote eye trackers have attracted much interest since they enable non-intrusive gaze estimation. To achieve high estimation accuracies for remote systems, user calibration is inevitable in order to compensate for the estimation bias caused by person-specific eye parameters. Although several explicit and implicit user calibration methods have been proposed to ease the calibration burden, the procedure is still cumbersome and needs further improvement. In this paper, we present a comprehensive analysis of regression-based user calibration techniques. We propose a novel weighted least squares regression-based user calibration method together with a real-time cross-ratio based gaze estimation framework. The proposed system enables to obtain high estimation accuracy with minimum user effort which leads to user-friendly HCI applications. Experimental results conducted on both simulations and user experiments show that our framework achieves a significant performance improvement over the state-of-the-art user calibration methods when only a few points are available for the calibration. I. I NTRODUCTION Gaze is considered as an essential modality for HCI because they contain crucial cues indicating visual attention, cognitive processes, emotional states and interpersonal interactions [1]. Besides, they are natural and fast which make them highly suit- able to interact with computer vision systems. Consequently, robust estimation of gaze with high precision and accuracy is of great interest for the development of many interesting diagnostic and HCI applications such as human attention and cognitive state analysis, usability testing, market research, disabled aids, gaze-based interactive user interfaces, mouse cursor positioning, page scrolling, gaze-based map navigation, gaze-based gaming and many other gaze-controlled computer functionalities. Recently, gaze estimation systems with a va- riety of applications have been introduced, and promising advancements have been made by both industry and scientific community [2]–[6]. However, there is still room for further research so as to improve the robustness and convenience of the systems. Gaze-based interfaces aim to accurately map user gaze to the screen coordinates. For interactive applications, mostly N. M. Arar, H. Gao and J.-P. Thiran are with the Signal Processing Lab (LTS5), ´ Ecole Polytechnique F´ ed´ erale de Lausanne, Switzerland, e-mail:(see http://people.epfl.ch/name.surname). Copyright 2016 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to pubs-permissions@ieee.org. remote gaze trackers are preferred even though their accuracies are lower compared to head/eye mounted gaze trackers. The main reason is that head mounted gaze trackers provide an unnatural and invasive experience for users due to their intrusiveness. Therefore, our focus is on remote video based gaze tracking where users’ eye are non-intrusively captured by a single or multiple cameras and the gaze is estimated through image processing and computer vision methods. Remote video based gaze tracking methods can be classified mainly into two groups as described in a recent survey [7]: appearance-based methods [8]–[12] and model-based methods [13]–[19]. Model- based methods mostly estimate three-dimensional (3D) gaze direction by modeling the eye in 3D. The intersection between scene geometry and gaze direction is computed as the point of regard (PoR). On the other hand, appearance-based methods simply map image features to gaze points. Their system and hardware requirements tend to be simpler than model-based methods. They simply require an ordinary camera. Neither camera nor geometric calibration is necessary. However, they are restricted to particular applications due to their limitations regarding the estimation accuracy and head movements. Al- though there are a few recent works (e.g., [11], [12]) that put an effort on improving the accuracy and, head pose and movement robustness, further advancements are necessary for them to be utilized for the precise eye tracking applications. On the contrary, model-based methods offer greater freedom of movement and high estimation accuracy (≤1 ◦ ). However, their biggest disadvantage is that they require more complex system setups such that camera and geometric calibrations are required to obtain 3D information. As they are based on accu- rate 3D modeling of user eye, user calibration is very crucial to estimate individual-specific eye parameters. Recently, there have been interesting calibration efforts for the purpose of more convenient and natural HCI. For instance, Sun et al. [17] propose a real-time gaze estimation system with online calibration. Instead of displaying a fixed number of calibration points, they update the eye parameters after each new point. The calibration process is completed as soon as the updates of eye parameters reach convergence. They reported that the system adapts to a new user by online calibration within 3 minutes and achieves an accuracy error ∼2 ◦ . Chen and Ji [19] suggests a user-friendly implicit calibration in which they estimate the probability distributions of eye parameters and gaze. They display several images with salient objects to a user and the method adapts to the user over time. They report an estimation accuracy error of <3 ◦ . In addition to appearance-based and 3D model-based meth- ods, there also exists another group of methods which are This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TCSVT.2016.2595322 Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.