computer methods and programs in biomedicine 82 ( 2 0 0 6 ) 73–80 journal homepage: www.intl.elsevierhealth.com/journals/cmpb A neural approach to extract foreground from human movement images S. Conforto ∗ , M. Schmid, A. Neri, T. D’Alessio Dept. of Applied Electronics, University Roma TRE, Via della Vasca Navale 84, I-00146 Roma, Italy article info Article history: Received 22 October 2004 Received in revised form 9 February 2006 Accepted 10 February 2006 Keywords: Segmentation Human movement analysis Quality assessment Neural networks abstract In recent years many approaches to foreground extraction from images related to human movement have been presented. The foreground extraction represents a pre-processing pro- cedure to be implemented in a system for capturing human movement in order to facilitate the tracking of anatomical landmarks on human bodies. In this work, an approach based on an unsupervised neural network has been studied: a Kohonen map has been designed to recognize and separate structures characterizing foreground and background. The proposed technique is fully automatic and its performance has been compared with those of two fur- ther approaches based on differences between foreground and background images. In order to quantify the segmentation quality, an already validated, objective, and automatic crite- rion has been used. The obtained results are adequate with the ﬁnal aim of the application and show the feasibility of the proposed approach. © 2006 Elsevier Ireland Ltd. All rights reserved. 1. Introduction The capture of human movement is a process dealing with the large scale movements of a subject at different resolution (that is, entire body, limbs, single ﬁngers). This is a hot topic for sev- eral applications, such as surveillance, control, and analysis [1]: the ﬁrst deals with monitoring movement in order to inter- pret and classify the subjects’ actions; the second exploits the results of the movement capture to implement control func- tionalities; the latter regards the analysis of captured data, to be used either for sport performance assessment or for diag- nostic and rehabilitation purposes. From a functional point of view, a system for the capture of movement can be subdivided into four logical blocks: initial- ization, tracking, pose estimation, and interpretation. The ﬁrst is related to all the procedures needed to correctly interpret the scene wherein the subject moves; the second is respon- sible for the tracking procedure; the third uses the tracking results to determine the pose of body segments over time; the ∗ Corresponding author. Fax: +39 06 55177026. E-mail address: conforto@uniroma3.it (S. Conforto). ﬁnal goal of the system is to interpret the global action fulﬁlled by the movement. The efﬁciency of tracking can be improved by the use of pre- processing techniques, such those allowing the extraction of regions of interest from video sequences, generally referred to as image segmentation [2]. This low-level process deals with the subdivision of a scene into regions of interest on the basis of different coherence criteria such as color [3], texture [4], edge [5], or any combination of these ones. In the framework of human movement analysis, the seg- mentation process often consists of separating the mov- ing subject (i.e. foreground), from the background. The fore- ground/background separation can be achieved by techniques based on either temporal or spatial data. Temporal data can be used in two different ways, subtrac- tion and ﬂow. Subtraction techniques perform the segmenta- tion by processing one or more inter-frame differences [6];a particularly simple situation occurs when in at least one frame only the background scene is present [7,8]. The optical ﬂow 0169-2607/$ – see front matter © 2006 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.cmpb.2006.02.005