View-independent prediction of body dimensions in crowded environments Tony Scoleri Image Analysis and Exploitation Group Defence Science and Technology Organisation Edinburgh 5111, Australia Email: tony.scoleri@dsto.defence.gov.au Maciej Henneberg Biological Anthropology & Comparative Anatomy Research Unit University of Adelaide Adelaide 5005, Australia Email: maciej.henneberg@adelaide.edu.au Abstract—This paper considers the problem of inferring the dimensions of non-visible body parts from images of incomplete bodies. This situation often occurs in CCTV videos of crowded scenes where people are mostly occluded. The approach we present relies on the ability to measure an observable body part which correlates to a missing body part. Anthropometric regression equations are then used to predict the dimension of the sought body part from the observable one. The example application of the paper considers acquiring a person’s head height to infer their stature. It is shown how a judicious selection of anthropometric points enables computation of the head height from any perspective images taken in uncontrolled environments with uncooperative subjects. Two regression models are proposed to infer stature from head height. Three real-life case studies have been chosen to assess the performance of our method on subjects observed in low resolution images and under various poses. Results show that the proposed method can yield statures of comparable accuracy to truth and two geometric methods. I. I NTRODUCTION In the last decade, security systems have been developed to perform automatic tracking of moving objects, often pedes- trians, across networks of cameras. The ability to follow or identify an individual in different imagery sources, or distin- guish between two or more candidates, relies on the accuracy of producing a human signature. Research in gait analysis and soft biometry retrieval from videos has aimed at providing precise descriptive characteristics of individuals, for instance a person’s stride, cadence and stature [1], [2]. In recent years, the problem of person re-identiﬁcation has reached a whole new level when the use of anthropometric measurements, or anthropomeasures, has enabled the estimation of further human descriptors (e.g. shoulder breadth and weight [3], [4]), the comparison of very diverse human traits [5] and even led to a breakthrough in camera calibration where a reference length is no longer needed to recover the scene absolute scale [6]. Despite theoretical and engineering progress, a large number of surveillance cameras capture images with resolution too low for the reliable identiﬁcation of faces [7]. High-resolution images have gained more popularity in anthropometric re- search, however the angular viewing position is restricted to a frontal body pose and subjects must be cooperative [6]. Other methods can recover various anthropomeasures from images and even reconstruct complete 3D human pose but the environment conditions are very controlled [8], [9]. In addition, reliable anthopometric data are either difﬁcult to collect (because the process must involve a large number of participants and adequate logistics) or expensive to purchase [10], [11]. Sometimes imagery experts had to derive unknown anthropometric ratios via trial and error [3]; this leaves room for scientiﬁc improvement. The main contribution of this paper is to propose a gene- ric framework to obtain the dimensions of non-observable 1 human body parts from a single image. In particular, an anthropometry-driven model is developed to predict the dimen- sions of missing body parts from a part that can be measured in the image. This may be seen as an extension to traditional geometric methods such as [12] which require that the whole body part be visible in order to prescribe a measurement. Importantly, the camera needs to be calibrated or initially uncalibrated with the possibility to be calibrated. Section IV presents a technique speciﬁcally designed for recovering com- plete Euclidean calibration under difﬁcult imaging conditions. The proposed model is applied to infer a person’s stature from their head height. Stature prediction has been chosen because it is often one of the valuable physical descriptors used for identiﬁcation and search in surveillance videos. A second contribution of this work is therefore to offer an alternative, view-invariant method to estimate the human stature from an image. Speciﬁcally, a judicious choice of anthropometric points are given to robustly delineate a person’s head. These points are identiﬁable in arbitrary perspective camera views and low resolution images, which is ideal for CCTV videos of uncontrolled environments with uncooperative subjects. One of our three case studies shows their application on robbers wearing masks. Section III describes a procedure and key considerations about the way to select these points among a multitude of other potential candidates. This work has led to investigate relationships which are typically not examined in anthropometry but satisfy requirements in the computer vision realm. We believe it is the ﬁrst application of our particular choice of anthopometric points in the imagery context. Through image modelling and a ﬁnal manual step, the procedure yields an estimate of the head height of a person. 1 A prediction is equally possible for observable but non-measurable body parts. For instance, if a person has their arm pointing towards the camera, the arm can be observed in the image but not measured. 978-1-4673-2181-5/12/$31.00 ©2012 Crown