Indoor head detection and tracking on RGBD images Katarzyna Ni˙ zalowska, Lukasz Burdka, Urszula Markowska-Kaczmar Institute of Informatics Wroclaw University of Technology Wyb. Wyspia´ nskiego 27, 50-370 Wroclaw, Poland Email: urszula.markowska-kaczmar@pwr.wroc.pl Abstract—A real-time human head detection and tracking method for a fall detection system is presented. It utilizes RGBD images to obtain a head position in the three-dimensional space. The proposed method is designed to be insensitive to a body orientation and requires no initial calibration for the tracked person. The evaluation was performed on the basis of annotated videos with realistic non-studio indoor everyday activities and falls. The proposed method outperforms head tracking from the Microsoft Kinect SDK skeleton tracking. I. I NTRODUCTION T RACKING of a human head is an important aspect of any system striving to monitor human behavior or health condition. Many conclusions can be made based on the information about a head position and orientation. A certain application that can benefit from a reliable information about a human head position is a fall detection for an elderly people monitoring system. Existing solutions focus mostly on detecting or tracking a human face instead of a head in general. Most of them also takes an assumption about constant vertical orientation of a human body. In a vast majority of situations such an approach is sufficient but in the context of a fall detection system there is no suitable existing solution. The aim of our research was to develop a robust head detection and tracking method that is capable of tracking a human head regardless of its orientation and independently of a tracked person. The method uses joined color, motion and depth data to effectively perform this task. The introduced method maintain its performance in situations when the head position and orientation change rapidly such as during a fall. The content of this paper is organized as follows. In section II related works in the field of head tracking are introduced. In section III we formulate the research problem, which our method is designed to solve. In section IV we describe the presented solution. Section V is dedicated to the evaluation of our method and contains the description of the experiment and the dataset followed by test results compared to the Kinect SDK head tracking [1]. In section VI we conclude our work and propose future work directions. II. RELATED WORKS The head tracking problem has been widely studied over the past few years. In the literature, definitions of this problem describe different tasks. The majority of papers identifies the problem of head tracking with face tracking. They only tackle situations when the face is clearly visible on a video image and take the assumption that it is located near the camera, as in [2], [3], [4], [5], [6], [7]. Two most common applications of such defined head tracking are to obtain certain facial features [6], [2], [3], [4], and to approximate a spatial head orientation [5], [8]. In this paper the problem of head tracking refers to determining the position of a head regardless of its rotation around the vertical axis. Since the information about a human head position and orientation can be utilized in a vast number of applications, there are many different approaches to solve this problem. In this paper we focus on vision systems as most versatile ones. The highest performance can be achieved using a thermal camera [9] as a data source. It is a consequence of a human head being easily distinguishable on thermal images. This solution, however, cannot be widely applied due to the high cost of thermal cameras. A common approach to this problem is using a video camera as a data source. The video camera was utilized in the methods described in [2], [3], [4], [10], [11], [6]. Recent appearance of affordable sensors containing both video and depth camera has exposed new possibilities in the field of image processing. A widely used device, integrating a depth sensor and a color camera is Microsoft Kinect. It is used for the head tracking task in [5] and [8]. Additionally Microsoft Company released SDK for Kinect [1], providing a skeleton tracking functionality. Thanks to this solution, if a skeleton is recognized properly by the Kinect sensor, information about a head position can be easily obtained, however, as shown in this paper, it lacks robustness. Among vision systems utilizing different data sources, there are various methods solving the head detection problem. A method presented in [12] uses background subtraction to detect a moving silhouette and treats its highest point as a head. In [11] the background subtraction is also used to find interest points. Then, a classifier is applied. In [10] each tracked head must be initially introduced to the tracking system from four directions. In [3] and [7] only a face is detected using a generic Haar cascade face detector [13]. In this case, a face needs to be visible in satisfactory resolution. After the head is detected, the tracking process can be initiated. Most methods assume an invariant orientation of a head during tracking. Therefore, a template is captured Proceedings of the 2014 Federated Conference on Computer Science and Information Systems pp. 679–686 DOI: 10.15439/2014F195 ACSIS, Vol. 2 978-83-60810-58-3/$25.00 c 2014, IEEE 679