View-independent prediction of body dimensions
in crowded environments
Tony Scoleri
Image Analysis and Exploitation Group
Defence Science and Technology Organisation
Edinburgh 5111, Australia
Email: tony.scoleri@dsto.defence.gov.au
Maciej Henneberg
Biological Anthropology & Comparative Anatomy Research Unit
University of Adelaide
Adelaide 5005, Australia
Email: maciej.henneberg@adelaide.edu.au
Abstract—This paper considers the problem of inferring the
dimensions of non-visible body parts from images of incomplete
bodies. This situation often occurs in CCTV videos of crowded
scenes where people are mostly occluded. The approach we
present relies on the ability to measure an observable body
part which correlates to a missing body part. Anthropometric
regression equations are then used to predict the dimension of
the sought body part from the observable one. The example
application of the paper considers acquiring a person’s head
height to infer their stature. It is shown how a judicious selection
of anthropometric points enables computation of the head height
from any perspective images taken in uncontrolled environments
with uncooperative subjects. Two regression models are proposed
to infer stature from head height. Three real-life case studies
have been chosen to assess the performance of our method on
subjects observed in low resolution images and under various
poses. Results show that the proposed method can yield statures
of comparable accuracy to truth and two geometric methods.
I. I NTRODUCTION
In the last decade, security systems have been developed to
perform automatic tracking of moving objects, often pedes-
trians, across networks of cameras. The ability to follow or
identify an individual in different imagery sources, or distin-
guish between two or more candidates, relies on the accuracy
of producing a human signature. Research in gait analysis and
soft biometry retrieval from videos has aimed at providing
precise descriptive characteristics of individuals, for instance
a person’s stride, cadence and stature [1], [2]. In recent years,
the problem of person re-identification has reached a whole
new level when the use of anthropometric measurements,
or anthropomeasures, has enabled the estimation of further
human descriptors (e.g. shoulder breadth and weight [3], [4]),
the comparison of very diverse human traits [5] and even led to
a breakthrough in camera calibration where a reference length
is no longer needed to recover the scene absolute scale [6].
Despite theoretical and engineering progress, a large number
of surveillance cameras capture images with resolution too
low for the reliable identification of faces [7]. High-resolution
images have gained more popularity in anthropometric re-
search, however the angular viewing position is restricted to
a frontal body pose and subjects must be cooperative [6].
Other methods can recover various anthropomeasures from
images and even reconstruct complete 3D human pose but
the environment conditions are very controlled [8], [9]. In
addition, reliable anthopometric data are either difficult to
collect (because the process must involve a large number of
participants and adequate logistics) or expensive to purchase
[10], [11]. Sometimes imagery experts had to derive unknown
anthropometric ratios via trial and error [3]; this leaves room
for scientific improvement.
The main contribution of this paper is to propose a gene-
ric framework to obtain the dimensions of non-observable
1
human body parts from a single image. In particular, an
anthropometry-driven model is developed to predict the dimen-
sions of missing body parts from a part that can be measured
in the image. This may be seen as an extension to traditional
geometric methods such as [12] which require that the whole
body part be visible in order to prescribe a measurement.
Importantly, the camera needs to be calibrated or initially
uncalibrated with the possibility to be calibrated. Section IV
presents a technique specifically designed for recovering com-
plete Euclidean calibration under difficult imaging conditions.
The proposed model is applied to infer a person’s stature
from their head height. Stature prediction has been chosen
because it is often one of the valuable physical descriptors used
for identification and search in surveillance videos. A second
contribution of this work is therefore to offer an alternative,
view-invariant method to estimate the human stature from
an image. Specifically, a judicious choice of anthropometric
points are given to robustly delineate a person’s head. These
points are identifiable in arbitrary perspective camera views
and low resolution images, which is ideal for CCTV videos
of uncontrolled environments with uncooperative subjects.
One of our three case studies shows their application on
robbers wearing masks. Section III describes a procedure
and key considerations about the way to select these points
among a multitude of other potential candidates. This work
has led to investigate relationships which are typically not
examined in anthropometry but satisfy requirements in the
computer vision realm. We believe it is the first application of
our particular choice of anthopometric points in the imagery
context. Through image modelling and a final manual step, the
procedure yields an estimate of the head height of a person.
1
A prediction is equally possible for observable but non-measurable body
parts. For instance, if a person has their arm pointing towards the camera, the
arm can be observed in the image but not measured.
978-1-4673-2181-5/12/$31.00 ©2012 Crown