A PROTOTYPE FUZZY SYSTEM FOR SURVEILLANCE PICTURE UNDERSTANDING HELMAN STERN, URI KARTOUN*, ARMIN SHMILOVICI Department of Industrial Engineering and Management, Ben-Gurion University P.O.Box 56, Be’er-Sheeva 84105, ISRAEL Fax: +972-8-6472958; Tel: +972-8-6461434 E-mail: (helman, kartoun, armin)@bgumail.bgu.ac.il ABSTRACT The last stage of any type of automatic surveillance system is the interpretation of the acquired information from its sensors. This work focuses on the interpretation of motion pictures taken from a surveillance camera, i.e.; image understanding. A prototype of a fuzzy expert system is presented which can describe in a natural language like manner, simple human activity in the field of view of a surveillance camera. The system is comprised of three components: a pre-processing module for image segmentation and feature extraction, an object identification fuzzy expert system (static model), and an action identification fuzzy expert system (dynamic temporal model). The system was tested on a video segment of a pedestrian passageway taken by a surveillance camera. Keywords: image understanding, picture segmentation, fuzzy expert systems, surveillance video 1. INTRODUCTION With the continuous decline in the price of imaging technology, there is a surge in the use of automatic surveillance systems and closed circuit TV (CCTV). Banks, ATM machines, schools, hospitals, transport walkways employ automatic video recording of their surrounding environments. There appears to be little human inspection (in real-time or otherwise) of these surveillance videos, and thus the system is relegated to a simple deterrence function (mainly for deterrence of possible felonies). However, in many environments it is necessary to understand the contents of the video for subsequent event detection, storage and retrieval. Extraction of the desired events requires a high semantic level of human understanding and requires a prohibitive amount of human processing. Automatic processing of surveillance videos introduces several practical problems. Recording, storing and managing of large volumes of data is expensive, and cost effective solutions are not yet available for cases where most of the data is useless. Also, in the case that someone will want to inspect the contents of the video, there would be a great deal of work involved in watching all the recorded segments. Thus, there is a need for an effective means by which the content of the data can be automatically characterized, organized, indexed, and retrieved, doing away with the slow, labor-intensive manual search task. Understanding the contents of an image, in the context of it’s importance to the operator of the surveillance system, is the key to efficient storage and retrieval of video segments. The problem of automatic image understanding is a difficult one. There are two possible paradigms for this problem [1,2]: computational feature based semantic analysis - the detection of features based on elaborate computational models; and human cognitive perception of high level semantics - the subjective user interpretation of features in the image. With the computational paradigm, it is technically difficult to identify correctly and in a reasonable amount of time, the contents of an image in all possible circumstances (e.g., identify an object from all possible angles). There is a need to develop a model for the features of an image in all possible circumstances. Human cognitive perception, on the other hand, starts with simple object segmentation, that is to segment 2D-plane images into physically meaningful objects. Image understanding is related to the relations between the objects in the picture, and the context in which they appear. This, in general, is very difficult to formulate as a computational problem. Yet, people, even children, can learn to do it with ease. The problem of image understanding can be facilitated [3,4] if we can restrict the type of objects to be identified, (e.g., humans), the quality of the identification (e.g., contours only), the possible relations between objects (e.g., approaching each other), and the context in which they operate (e.g., a closed passageway). In this work a prototype of fuzzy expert system is presented which can describe, in natural language like- way, simple human activity in the field of view of a IASTED International ConferenceVisualization, Imaging, and Image Processing (VIIP 2001), Marbella, Spain, September 3-5, 2001.