Demo Abstract: Perception vs. Reality - Never Believe in What
You See
Yunfeng Huang, Fang-Jing Wu
Christian Hakert, Georg von der Brüggen, Kuan-Hsun Chen, Jian-Jia Chen
Patrick Böcker, Petr Chernikov, Luis Cruz, Zeyi Duan, Ahmed Gheith, Yantao Gong, Anand
Gopalan, Karthik Prakash, Ammar Tauqir, Yue Wang
yunfeng.huang@tu-dortmund.de
Communication Network Institute & Design Automation for Embedded Systems Group, Dortmund
ABSTRACT
The increasing availability of heterogeneous ambient sensing sys-
tems challenges the according information processing systems to
analyse and compare a variety of diferent systems in a single sce-
nario. For instance, localization of objects can be performed by
image processing systems as well as by radio based localization. If
such systems are utilized to localize the same objects, synergy of the
outputs is important to enable comparable and meaningful analysis.
This demo showcases the practical deployment and challenges of
such an example system.
CCS CONCEPTS
• Computer systems organization → Sensor networks.
KEYWORDS
data fusion, localization, computer vision, radio perception
1 INTRODUCTION
Nowadays, heterogeneous ambient sensing systems, such as Blue-
tooth or Wif, are exploited to perceive information of the environ-
ment. As a consequence, a partial view of the reality is captured,
which may be error prone. To tackle this problem, recent research
proposes to synergize multiple sensing systems [2]. As the outputs
from these systems are highly dependent on the system itself, they
require advanced eforts for processing and making diferent sys-
tems comparable. An image processing system, for instance, can
detect objects on an image within a bounding box. The usage of a
stereo camera further allows to collect relative 3-D coordinates of
single pixels in the image. Transforming these information into the
absolute 3-D coordinate of the detected object requires a specifc
processing, which is showcased in this demo.
In addition to the stereo camera based image processing system,
we deploy a Bluetooth Low Energy (BLE) sniffing infrastructure
in our lab setup. This infrastructure captures the signal strength
of Bluetooth beacons, attached to the objects, at each snifer node.
A subsequent estimation of the distances and trilateration also al-
lows to determine absolute 3-D coordinates of the objects. The fnal
coordinates of both system are compared in a dedicated merging
algorithm to create a highly accurate position estimation. Addition-
ally, a mismatch between the single systems can be detected.
2 DEMO SETUP
We equip a plain 3-D space with a camera stand and a stereo camera
for the image processing systems. At the corners of the space, we
mount Raspberry Pi nodes as BLE snifers for the BLE identifcation
system. Any person moving within the 3-D area is equipped with
a Bluetooth beacon, such that both systems recognize the person.
The fnal data processing is done at a computer workstation, where
an attached display shows a user interface, detailed in Section 4.
The communication between all nodes is realized through cabled
network to reduce the interference due to WiFi signals.
Visitors of the demo can inspect the technical setup in detail. The
user interface provides internal data from the single systems, i.e.,
the accuracy due to processing for both systems can be observed.
Furthermore, the fnal output of the data merging strategy can be
observed for diferent scenarios.
3 SENSOR SYSTEMS AND DATA MERGING
This demo incorporates two sensor systems, in which each delivers
one perception on a certain reality. All along, the system includes a
fusion strategy, overlaying both perceptions to estimate accurate po-
sition information and detect data mismatches. This section details
the required algorithms and the tackled problems for processing
both systems to deliver comparable and mergeable data.
3.1 Image Identifcation and 3D Localization
The computer vision system in this demo is a visual perception
system, which consists of a ZED stereo camera, mounted at a fxed
and known location. To perform the localization using this camera,
the depth information of the frame, captured by the camera, is taken
into account.
In this demo, frst the YOLOv3 deep-learning classifer [1] is
used to detect and classify objects of interest inside the captured
RGB frame. After detection and classifcation, the trained classifer
outputs the coordinates of the bounding box as the result. In ad-
dition, to ensure the continuity of visual perception and to detect
the objects in real time, object tracking is required to speed up
the object detection and to deal with the momentary occlusion of
objects. The outputs of the detection module are the coordinates
of bounding boxes, which are forwarded to the post processing.
The post processing performs background subtraction using the
MOG2 background subtractor [3], in which the foreground pixels
can be distinguished from the background pixels. The geometric
mean of the detected foreground pixel is considered as the center of
each object. The camera-relative 3-D coordinates of the estimated
centers are considered as the object’s coordinates. Afterwards, the
extracted coordinates are transformed to the fxed world coordi-
nates, which is the localization result of the visual perception. The
363
2020 19th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN)
978-1-7281-5497-8/20/$31.00 ©2020 IEEE
DOI 10.1109/IPSN48710.2020.000-5