978-1-4244-2154-1/08/$25.00 ©2008 IE
Abstract
In this work, we present a system that estimates a user’s
focus of attention in front of a computer screen, using a
web camera, based on detection and tracking of the user’s
head position and eye movements. Utilizing machine
learning concepts, the system gives real time feedback on
the user’s attention, by combining information coming
from eye gaze, head pose, and distance from the screen.
The system is completely un-intrusive and no special
hardware (such as infrared cameras or wearable devices)
is needed. Furthermore, it adjusts to every user, not
necessitating initial calibration, and can work under real
and unconstrained conditions in terms of lighting.
1. Introduction
There are techniques around the issue of head pose
estimation which use more than one camera, or extra
equipment for head pose estimation [1], techniques based
on facial feature detection [2], suffering from the problem
of robustness, or techniques that estimate the head pose [3]
using face bounding boxes, requiring the detected region
to be aligned with the training set. Appearance-based gaze
trackers employ computer vision techniques to find the
eyes in the input image and then determine the orientation
of the irises [4]. Some of the existing eye gaze techniques
[4], estimate the iris contours (circles or ellipses) on the
image plane and, using edge operators, detect the iris outer
boundaries. While not a lot of work has been done towards
combining eye gaze and head pose information, in a non-
intrusive environment, the proposed method uses a
combination of the two inputs.
2. Description of our method
The objective of the current work is the development
of a system that can work in real time conditions, under
normal lighting, with only requirement in terms of
hardware, a simple web camera. An application of our
method is on e-learning environments. More particularly,
learning procedures concerning children with learning
difficulties employ word highlighting, font resizing,
sounds, etc, that can be adjusted according to the visual
attention of the child. Figure 1 gives an overview of the
steps employed in our method for inferring user state
(attention/non-attention/struggle to read).
2.1. Face and Facial Feature Localization
For face detection, the Boosted Cascade method
described in [5] is employed, as it gives robust and real
time results, together with a post-process step [6]. For eye
centre localization, an approach based on [6] was used.
For the detection of the eye corners (left, right, upper and
lower) a technique similar to that described in [7] is used.
Following points of interest detection, tracking follows by
using a three-pyramid Lucas-Kanade tracking algorithm.
2.2. Head Pose, Eye Gaze and User-Monitor
distance changes
For head pose estimation, the displacement of a
reference point on the face (here, the middle of the inter-
ocular line) is taken into account. At each frame, the
displacement of this point w.r.t. its position at the frontal
view (pose vector) is normalized with the inter-ocular
distance to cater for different scales. Since in real
conditions, tracking might fail under some circumstances,
a series of rules is imposed to cater for e.g. rapid rotations,
which hamper tracking of the visible features. In such
cases, when the user comes back to a frontal view, the
vector corresponding to pose estimation reduces in length
and stays fixed for as long as the user is looking at the
monitor. In these cases, the algorithm can reinitialize by
re-detecting the face, the facial features and tracking. The
idea behind eye gaze estimation is to estimate relative
changes of the eye centre position with regards to the
actual centre of the eye area (Figure 2). Tracking the eye
centres can give information regarding the user’s inter-
ocular distance in pixels at each frame. Fractions of the
inter-ocular distance at the current frame with regards to
the first frame give information regarding user’s distance
changes from the monitor.
A non-intrusive method for user focus of attention estimation in front of a
computer monitor
Stylianos Asteriadis, Paraskevi Tzouveli, Kostas Karpouzis, Stefanos Kollias
Image, Video and Multimedia Systems Laboratory, National Technical University of Athens
GR-157 80 Zographou, Greece
{stiast, tpar, kkarpou}@image.ntua.gr, stefanos@cs.ntua.gr