Human-Computer Interaction through Time-of-Flight and RGB Cameras Piercarlo Dondi, Luca Lombardi, and Marco Porta Department of Computer Engineering and Systems Science, University of Pavia, Via Ferrata 1, 27100 Pavia, Italy {piercarlo.dondi,luca.lombardi,marco.porta}@unipv.it Abstract. The number of systems exploiting Time-of-Flight (ToF) cameras for gesture recognition has greatly increased in the last years, confirming a very positive trend of this technology within the field of Human-Computer Interaction. In this work we present a new kind of application for the interaction with a virtual keyboard which is based on the use of an ordinary RGB webcam and a ToF camera. Our ap- proach can be subdivided into two steps: firstly a segmentation of the entire body of the user is achieved exploiting only the ToF data; then the extraction of hands and head is obtained applying color information on the retrieved clusters. The final tracking step, based on the Kalman filter, is able to recognize the chosen hand also in presence of a second hand or the head. Tests, carried out with users of different ages, showed interesting results and a quick learning curve. Keywords: Time-of-Flight camera, human-computer interaction, hand recognition. 1 Introduction Time-of-Flight (ToF) cameras are able to measure depth in real-time using a sin- gle compact sensor, unlike previous multi-camera systems, such as stereo cams. In the last years, research has shown a large interest in such devices in many fields related to computer vision and computer graphics, like 3D modeling, scene reconstruction, user interaction or segmentation and tracking of moving people [1]. In some cases, the ToF contribution is combined with color informations supplied by a traditional RGB camera to achieve more complex or precise re- sults: for instance, in [2] depth and color data are used to create a 3D ambient for mixed reality; in [3] depth information is exploited to select the best input area for a color-based segmentation algorithm (SIOX); while in [4] a fusion of colors and depth data is employed in a new segmentation and tracking method for compensating the respective weaknesses of the two different kinds of sensors. In this paper we focus on the combined use of a RGB and a ToF camera for Human-Computer Interaction (HCI), presenting a new application which allows the user to control virtual on-screen keyboards (in particular a QWERTY keyboard and a numeric pad). The ToF stream is used for the initial search of the G. Maino and G.L. Foresti (Eds.): ICIAP 2011, Part II, LNCS 6979, pp. 89–98, 2011. c Springer-Verlag Berlin Heidelberg 2011