Hindawi Publishing Corporation International Journal of Digital Multimedia Broadcasting Volume 2010, Article ID 920121, 9 pages doi:10.1155/2010/920121 Research Article Flexible Human Behavior Analysis Framework for Video Surveillance Applications Weilun Lao, 1, 2 Jungong Han, 1 and Peter H. N. de With 1, 3 1 Eindhoven University of Technology, Den Dolech 2, 5600MB Eindhoven, The Netherlands 2 Guangdong Power Grid Company, 510620 Guangzhou, China 3 Cyclomedia, 4180BB Waardenburg, The Netherlands Correspondence should be addressed to Weilun Lao, w.lao@tue.nl Received 5 October 2009; Accepted 9 January 2010 Academic Editor: Ling Shao Copyright © 2010 Weilun Lao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We study a ﬂexible framework for semantic analysis of human motion from surveillance video. Successful trajectory estimation and human-body modeling facilitate the semantic analysis of human activities in video sequences. Although human motion is widely investigated, we have extended such research in three aspects. By adding a second camera, not only more reliable behavior analysis is possible, but it also enables to map the ongoing scene events onto a 3D setting to facilitate further semantic analysis. The second contribution is the introduction of a 3D reconstruction scheme for scene understanding. Thirdly, we perform a fast scheme to detect diﬀerent body parts and generate a ﬁtting skeleton model, without using the explicit assumption of upright body posture. The extension of multiple-view fusion improves the event-based semantic analysis by 15%–30%. Our proposed framework proves its eﬀectiveness as it achieves a near real-time performance (13–15 frames/second and 6–8 frames/second) for monocular and two-view video sequences. 1. Introduction Visual surveillance for human-behavior analysis has been investigated worldwide as an active research topic [1]. In order to have automatic surveillance accepted by a large community, it requires a suﬃciently high accuracy and the computation complexity should enable a real-time performance. In the video-based surveillance application, even if the motion of persons is known, this is not suﬃcient to describe the posture of the person. The postures of the persons can provide important clues for understanding their activities. Therefore, accurate detection and recognition of various human postures both contribute to the scene understanding. The accuracy of the system is hampered by the use of a single camera, in case of complex situations and several people undertaking actions in the same scene. Often, the posture of people is occluded, so that the behavior cannot be realized in high accuracy. In this paper, we contribute to improve the analysis accuracy by exploiting the use of second camera and mapping the event into a 3D scene model, that enables analysis of the behavior in the 3D domain. Let us now discuss related work from the literature. 1.1. Related Work. Most surveillance systems have focused on understanding the events through the study of trajectories and positions of persons using a priori knowledge about the scene. The Pﬁnder [2] system was developed to describe a moving person in an indoor environment. It tracks a single nonoccluded person in complex scenes. The VSAM [3] system can monitor activities over various scenarios, using multiple cameras that are connected as a network. It can detect and track multiple persons and vehicles within cluttered scenes and manage their activities over a long period of time. The real-time visual surveillance system W4 [4] employs the combined techniques of shape analysis and body tracking, and models diﬀerent appearances of a person. This single-camera system detects and tracks groups of people and monitors their behaviors, even in the presence of partial occlusion and in outdoor environments. However,