Article Highly Accurate and Fully Automatic 3–D Head Pose Estimation and Eye Gaze Estimation Using RGB-–D Sensors and 3D Morphable Models Reza Shoja Ghiass 1,† and Denis Laurendeau 2,† 1 Afﬁliation 1; reza.shoja@gmail.com 2 Afﬁliation 2; Denis.Laurendeau@gel.ulaval.ca * Correspondence: Denis.Laurendeau@gel.ulaval.ca; Tel.: +1-(418) 656-2131, ext. 2979 † Current address: 1665 Rue de l’Universite, Universite Laval, Quebec, QC, Canada, G1V 0A6 Version October 15, 2018 submitted to Sensors Abstract: This work addresses the problem of automatic head pose estimation and its application in 1 3D gaze estimation using low quality RGB–D sensors without any subject cooperation or manual 2 intervention. The previous works on 3D head pose estimation using RGB–D sensors require either 3 an ofﬂine step for supervised learning or 3D head model construction which may require manual 4 intervention or subject cooperation for complete head model reconstruction. In this paper, we 5 propose a 3D pose estimator based on low quality depth data, which is not limited by any of the 6 aforementioned steps. Instead, the proposed technique relies on modeling the subject’s face in 3–D 7 rather than the complete head, which in turn, relaxes all of the constraints with the previous works. 8 The proposed method is robust, highly accurate and fully automatic. Moreover, it does not need any 9 ofﬂine step. Unlike some of the previous works, the method only uses depth data for pose estimation. 10 The experimental results on the Biwi head pose database conﬁrm the efﬁciency of our algorithm 11 in handling large pose variations and partial occlusion. We also evaluate the performance of our 12 algorithm on IDIAP database for 3D head pose and eye gaze estimation. 13 Keywords: 3–D Morphable Models; 3–D Head Pose Estimation; 3–D Eye Gaze Estimation; Iterative 14 Closest Point; RGB–D Sensors; 15 1. Introduction 16 Head pose estimation is a key step in understanding human behavior and can have different 17 interpretations depending on the context. From the computer vision point of view, head pose estimation 18 is the task of inferring the direction of head from digital images or range data compared to the imaging 19 sensor coordinate system. In the literature, the head is assumed to be a rigid object with three degrees 20 of freedom, i.e., the head pose estimation is expressed in terms of yaw, roll and pitch. Generally, the 21 previous works on head pose estimation can be divided into two categories: (i) the methods based 22 on 2D images, and (ii) depth data [1]. The pose estimators based on 2D images generally require 23 some pre–processing steps to translate the pixel–based representation of the head into some direction 24 cues. Several challenges such as camera distortion, projective geometry, lighting, changes in facial 25 expression exist in 2D image–based head pose estimators. A comprehensive study of pose estimation 26 is given in [1] and the reader can refer to this reference for more details on the literature. 27 Unlike the 2D pose estimators, the systems based on 3D range data or their combination with 2D 28 images have demonstrated very good performance in the literature [2–7]). While most of the work on 29 3D pose estimation in the literature is based on non–consumer level sensors [8–10], recent advances 30 in production of consumer level RGB–D sensors such as the Microsoft Kinect or the Asus Xtion has 31 Submitted to Sensors, pages 1 – 14 www.mdpi.com/journal/sensors Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 15 October 2018 doi:10.20944/preprints201810.0309.v1 © 2018 by the author(s). Distributed under a Creative Commons CC BY license. Peer-reviewed version available at Sensors 2018, 18, 4280; doi:10.3390/s18124280