Hindawi Publishing Corporation
International Journal of Digital Multimedia Broadcasting
Volume 2010, Article ID 920121, 9 pages
doi:10.1155/2010/920121
Research Article
Flexible Human Behavior Analysis Framework for
Video Surveillance Applications
Weilun Lao,
1, 2
Jungong Han,
1
and Peter H. N. de With
1, 3
1
Eindhoven University of Technology, Den Dolech 2, 5600MB Eindhoven, The Netherlands
2
Guangdong Power Grid Company, 510620 Guangzhou, China
3
Cyclomedia, 4180BB Waardenburg, The Netherlands
Correspondence should be addressed to Weilun Lao, w.lao@tue.nl
Received 5 October 2009; Accepted 9 January 2010
Academic Editor: Ling Shao
Copyright © 2010 Weilun Lao et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
We study a flexible framework for semantic analysis of human motion from surveillance video. Successful trajectory estimation
and human-body modeling facilitate the semantic analysis of human activities in video sequences. Although human motion is
widely investigated, we have extended such research in three aspects. By adding a second camera, not only more reliable behavior
analysis is possible, but it also enables to map the ongoing scene events onto a 3D setting to facilitate further semantic analysis. The
second contribution is the introduction of a 3D reconstruction scheme for scene understanding. Thirdly, we perform a fast scheme
to detect different body parts and generate a fitting skeleton model, without using the explicit assumption of upright body posture.
The extension of multiple-view fusion improves the event-based semantic analysis by 15%–30%. Our proposed framework proves
its effectiveness as it achieves a near real-time performance (13–15 frames/second and 6–8 frames/second) for monocular and
two-view video sequences.
1. Introduction
Visual surveillance for human-behavior analysis has been
investigated worldwide as an active research topic [1]. In
order to have automatic surveillance accepted by a large
community, it requires a sufficiently high accuracy and
the computation complexity should enable a real-time
performance. In the video-based surveillance application,
even if the motion of persons is known, this is not sufficient
to describe the posture of the person. The postures of
the persons can provide important clues for understanding
their activities. Therefore, accurate detection and recognition
of various human postures both contribute to the scene
understanding. The accuracy of the system is hampered by
the use of a single camera, in case of complex situations and
several people undertaking actions in the same scene. Often,
the posture of people is occluded, so that the behavior cannot
be realized in high accuracy. In this paper, we contribute to
improve the analysis accuracy by exploiting the use of second
camera and mapping the event into a 3D scene model, that
enables analysis of the behavior in the 3D domain. Let us now
discuss related work from the literature.
1.1. Related Work. Most surveillance systems have focused
on understanding the events through the study of trajectories
and positions of persons using a priori knowledge about
the scene. The Pfinder [2] system was developed to describe
a moving person in an indoor environment. It tracks a
single nonoccluded person in complex scenes. The VSAM
[3] system can monitor activities over various scenarios,
using multiple cameras that are connected as a network. It
can detect and track multiple persons and vehicles within
cluttered scenes and manage their activities over a long
period of time. The real-time visual surveillance system
W4 [4] employs the combined techniques of shape analysis
and body tracking, and models different appearances of a
person. This single-camera system detects and tracks groups
of people and monitors their behaviors, even in the presence
of partial occlusion and in outdoor environments. However,