Demo Abstract: Perception vs. Reality - Never Believe in What You See Yunfeng Huang, Fang-Jing Wu Christian Hakert, Georg von der Brüggen, Kuan-Hsun Chen, Jian-Jia Chen Patrick Böcker, Petr Chernikov, Luis Cruz, Zeyi Duan, Ahmed Gheith, Yantao Gong, Anand Gopalan, Karthik Prakash, Ammar Tauqir, Yue Wang yunfeng.huang@tu-dortmund.de Communication Network Institute & Design Automation for Embedded Systems Group, Dortmund ABSTRACT The increasing availability of heterogeneous ambient sensing sys- tems challenges the according information processing systems to analyse and compare a variety of diferent systems in a single sce- nario. For instance, localization of objects can be performed by image processing systems as well as by radio based localization. If such systems are utilized to localize the same objects, synergy of the outputs is important to enable comparable and meaningful analysis. This demo showcases the practical deployment and challenges of such an example system. CCS CONCEPTS Computer systems organization Sensor networks. KEYWORDS data fusion, localization, computer vision, radio perception 1 INTRODUCTION Nowadays, heterogeneous ambient sensing systems, such as Blue- tooth or Wif, are exploited to perceive information of the environ- ment. As a consequence, a partial view of the reality is captured, which may be error prone. To tackle this problem, recent research proposes to synergize multiple sensing systems [2]. As the outputs from these systems are highly dependent on the system itself, they require advanced eforts for processing and making diferent sys- tems comparable. An image processing system, for instance, can detect objects on an image within a bounding box. The usage of a stereo camera further allows to collect relative 3-D coordinates of single pixels in the image. Transforming these information into the absolute 3-D coordinate of the detected object requires a specifc processing, which is showcased in this demo. In addition to the stereo camera based image processing system, we deploy a Bluetooth Low Energy (BLE) sniffing infrastructure in our lab setup. This infrastructure captures the signal strength of Bluetooth beacons, attached to the objects, at each snifer node. A subsequent estimation of the distances and trilateration also al- lows to determine absolute 3-D coordinates of the objects. The fnal coordinates of both system are compared in a dedicated merging algorithm to create a highly accurate position estimation. Addition- ally, a mismatch between the single systems can be detected. 2 DEMO SETUP We equip a plain 3-D space with a camera stand and a stereo camera for the image processing systems. At the corners of the space, we mount Raspberry Pi nodes as BLE snifers for the BLE identifcation system. Any person moving within the 3-D area is equipped with a Bluetooth beacon, such that both systems recognize the person. The fnal data processing is done at a computer workstation, where an attached display shows a user interface, detailed in Section 4. The communication between all nodes is realized through cabled network to reduce the interference due to WiFi signals. Visitors of the demo can inspect the technical setup in detail. The user interface provides internal data from the single systems, i.e., the accuracy due to processing for both systems can be observed. Furthermore, the fnal output of the data merging strategy can be observed for diferent scenarios. 3 SENSOR SYSTEMS AND DATA MERGING This demo incorporates two sensor systems, in which each delivers one perception on a certain reality. All along, the system includes a fusion strategy, overlaying both perceptions to estimate accurate po- sition information and detect data mismatches. This section details the required algorithms and the tackled problems for processing both systems to deliver comparable and mergeable data. 3.1 Image Identifcation and 3D Localization The computer vision system in this demo is a visual perception system, which consists of a ZED stereo camera, mounted at a fxed and known location. To perform the localization using this camera, the depth information of the frame, captured by the camera, is taken into account. In this demo, frst the YOLOv3 deep-learning classifer [1] is used to detect and classify objects of interest inside the captured RGB frame. After detection and classifcation, the trained classifer outputs the coordinates of the bounding box as the result. In ad- dition, to ensure the continuity of visual perception and to detect the objects in real time, object tracking is required to speed up the object detection and to deal with the momentary occlusion of objects. The outputs of the detection module are the coordinates of bounding boxes, which are forwarded to the post processing. The post processing performs background subtraction using the MOG2 background subtractor [3], in which the foreground pixels can be distinguished from the background pixels. The geometric mean of the detected foreground pixel is considered as the center of each object. The camera-relative 3-D coordinates of the estimated centers are considered as the object’s coordinates. Afterwards, the extracted coordinates are transformed to the fxed world coordi- nates, which is the localization result of the visual perception. The 363 2020 19th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN) 978-1-7281-5497-8/20/$31.00 ©2020 IEEE DOI 10.1109/IPSN48710.2020.000-5