978-1-4244-2154-1/08/$25.00 ©2008 IE Abstract In this work, we present a system that estimates a user’s focus of attention in front of a computer screen, using a web camera, based on detection and tracking of the user’s head position and eye movements. Utilizing machine learning concepts, the system gives real time feedback on the user’s attention, by combining information coming from eye gaze, head pose, and distance from the screen. The system is completely un-intrusive and no special hardware (such as infrared cameras or wearable devices) is needed. Furthermore, it adjusts to every user, not necessitating initial calibration, and can work under real and unconstrained conditions in terms of lighting. 1. Introduction There are techniques around the issue of head pose estimation which use more than one camera, or extra equipment for head pose estimation [1], techniques based on facial feature detection [2], suffering from the problem of robustness, or techniques that estimate the head pose [3] using face bounding boxes, requiring the detected region to be aligned with the training set. Appearance-based gaze trackers employ computer vision techniques to find the eyes in the input image and then determine the orientation of the irises [4]. Some of the existing eye gaze techniques [4], estimate the iris contours (circles or ellipses) on the image plane and, using edge operators, detect the iris outer boundaries. While not a lot of work has been done towards combining eye gaze and head pose information, in a non- intrusive environment, the proposed method uses a combination of the two inputs. 2. Description of our method The objective of the current work is the development of a system that can work in real time conditions, under normal lighting, with only requirement in terms of hardware, a simple web camera. An application of our method is on e-learning environments. More particularly, learning procedures concerning children with learning difficulties employ word highlighting, font resizing, sounds, etc, that can be adjusted according to the visual attention of the child. Figure 1 gives an overview of the steps employed in our method for inferring user state (attention/non-attention/struggle to read). 2.1. Face and Facial Feature Localization For face detection, the Boosted Cascade method described in [5] is employed, as it gives robust and real time results, together with a post-process step [6]. For eye centre localization, an approach based on [6] was used. For the detection of the eye corners (left, right, upper and lower) a technique similar to that described in [7] is used. Following points of interest detection, tracking follows by using a three-pyramid Lucas-Kanade tracking algorithm. 2.2. Head Pose, Eye Gaze and User-Monitor distance changes For head pose estimation, the displacement of a reference point on the face (here, the middle of the inter- ocular line) is taken into account. At each frame, the displacement of this point w.r.t. its position at the frontal view (pose vector) is normalized with the inter-ocular distance to cater for different scales. Since in real conditions, tracking might fail under some circumstances, a series of rules is imposed to cater for e.g. rapid rotations, which hamper tracking of the visible features. In such cases, when the user comes back to a frontal view, the vector corresponding to pose estimation reduces in length and stays fixed for as long as the user is looking at the monitor. In these cases, the algorithm can reinitialize by re-detecting the face, the facial features and tracking. The idea behind eye gaze estimation is to estimate relative changes of the eye centre position with regards to the actual centre of the eye area (Figure 2). Tracking the eye centres can give information regarding the user’s inter- ocular distance in pixels at each frame. Fractions of the inter-ocular distance at the current frame with regards to the first frame give information regarding user’s distance changes from the monitor. A non-intrusive method for user focus of attention estimation in front of a computer monitor Stylianos Asteriadis, Paraskevi Tzouveli, Kostas Karpouzis, Stefanos Kollias Image, Video and Multimedia Systems Laboratory, National Technical University of Athens GR-157 80 Zographou, Greece {stiast, tpar, kkarpou}@image.ntua.gr, stefanos@cs.ntua.gr