International Conference on Computer Systems and Technologies - CompSysTech’09 A Vision-Based Attentive User Interface with (Semi-)Automatic Parameter Calibration Fabio Fosso, Marco Porta Abstract: In this paper we present a vision-based perceptive interface able to recognize some basic “user states” (presence, absence, head orientation and phoning). This recognition module can be used to implement attentive interfaces acting differently according to user behaviour. To efficiently work in real-time, the system exploits skin-colour detection, reliably identifying face and hands within captured frames. Colour- based techniques, however, have the problem of requiring precise calibration, which is usually a tedious and delicate task. We therefore propose a (semi-)automatic calibration procedure which frees the user from such a charge, totally or, if necessary, via a guided wizard. Our experiments demonstrate that this solution works well and greatly improves the propensity of users to employ a vision-based perceptive interface. Key words: vision-based perceptive interfaces, attentive interfaces, implicit communication, calibration, activity recognition. INTRODUCTION Perceptive User Interfaces provide the computer with perceptive capabilities, to acquire information about users and their environment. Vision-Based Interfaces (VBIs), in particular, are perceptive interfaces which exploit vision as a communication channel from the user to the computer. Thanks to the decrease of camera costs (especially webcams), nowadays VBIs can in fact become a commercial reality. In this paper we describe a VBI able to distinguish among some basic “user states”, namely presence, absence, head orientation and phoning. While few, such conditions can be exploited for a variety of purposes. For example: if the user gets away, the PC could go in standby mode, and resume when she or he will come back; if the user is not looking at the screen when an error occurs (or a new email message arrives), a sound alert could be played; if the user answers the phone while listening to music, the volume could be automatically turned down; and so on. For the recognition process, we exploit two-dimensional appearance-based approaches. In particular, we rely on skin colour detection for face and hands identification. While usually computationally not expensive, colour-based approaches suffer from a main drawback: changes in illumination conditions may affect the recognition procedure, and when this occurs re-calibration of system parameters is necessary. Since such process may be badly perceived by the user, we have implemented a semi-automatic calibration solution that greatly eases such task. To our knowledge, no existing appearance- and colour-based VBI provides real automatic or semi-automatic calibration. RELATED WORK The problem of monitoring the user’s postures or activities while using the personal computer has been considered by some authors in the past, in different forms. The purpose is usually to implement attentive interfaces, both for human-computer interaction improvement and for self-reporting or interruption management (e.g. [1]). Since we are only interested in non-invasive approaches, however, which do not require the user to wear special devices, the available works become very few. Moreover, we do not consider complex (and expensive) systems based on eye-tracking, but only cheap webcam-based solutions. As representative examples, in [2] a system is described which allows the mouse pointer to be switched between different monitors by moving the head, in a multi- screen setting. To assess the user’s head direction, a 3D stereovision approach is used (exploiting two cameras), based on a commercially-available face detection software. In [3], a classification technique is presented that separates video scenes containing office work tasks, tracking both face and hands. Also interesting is the system described in [4],