Long Term Person Re-Identification from Depth Cameras using Facial and Skeleton Data Enrico Bondi, Pietro Pala, Lorenzo Seidenari, Stefano Berretti, Alberto Del Bimbo Media Integration and Communication Center - MICC University of Florence Florence, Italy {enrico.bondi,pietro.pala,lorenzo.seidenari, stefano.berretti,alberto.delbimbo}@unifi.it http://www.micc.unifi.it/ Abstract. Depth cameras enable long term re-identification exploiting 3D information that captures biometric cues such as face and characteris- tic lengths of the body. In the typical approach, person re-identification is performed using appearance, thus invalidating any application in which a person may change dress across subsequent acquisitions. For example, this is a relevant scenario for home patient monitoring. Unfortunately, face and skeleton quality is not always enough to grant a correct recog- nition from depth data. Both features are affected by the pose of the subject and the distance from the camera. We propose a model to in- corporate a robust skeleton representation with a highly discriminative face feature, weighting samples by their quality. Our method improves rank-1 accuracy especially on short realistic sequences. 1 Introduction Advances in 3D scanning technologies make it possible to capture geometric and visual data of an observed scene and its dynamics across time. The availability of registered depth and RGB frames across time boosts the potential of automatic analysis modules that can now easily detect and track people and their body parts as they move in the scene. However, the technologies employed in current 3D dynamic scanning devices limit their field of view at a distance of few meters, with the quality of the sensed data degrading already at 2 meters distance. As a consequence, the tracking libraries released with such devices can track the target just if it is visible and sufficiently close to the sensor: if the moving target becomes too far from the sensor or it is no more in its field of view, the tracking is not possible. The ultimate result is that in the case a target observed in the past enters again the field of view of the camera, it is considered as a new one, loosing any relation between the two intervals of observation. To exemplify a possible concrete scenario of application, let us consider the monitoring of a patient in a domestic environment as can be the case of elderly