Citation: Li, Y.-L.; Li, H.-T.; Chiang,
C.-K. Multi-Camera Vehicle Tracking
Based on Deep Tracklet Similarity
Network. Electronics 2022, 11, 1008.
https://doi.org/10.3390/
electronics11071008
Academic Editor: John Ball, Ning
Wang
Received: 7 December 2021
Accepted: 22 March 2022
Published: 24 March 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
electronics
Article
Multi-Camera Vehicle Tracking Based on Deep Tracklet
Similarity Network
Yun-Lun Li, Hao-Ting Li and Chen-Kuo Chiang *
Advanced Institute of Manufacturing with High-Tech Innovations, Center for Innovative Research on Aging
Society (CIRAS) and Department of Computer Science and Information Engineering, National Chung Cheng
University, Minhsiung, Chiayi 621301, Taiwan; xu3mp6xjp6@gmail.com (Y.-L.L.); remidream@gmail.com (H.-T.L.)
* Correspondence: ckchiang@cs.ccu.edu.tw; Tel.: +886-5-272-9111
Abstract: Multi-camera vehicle tracking at the city scale has received lots of attention in the last
few years. It has large-scale differences, frequent occlusion, and appearance differences caused by
the viewing angle differences, which is quite challenging. In this research, we propose the Tracklet
Similarity Network (TSN) for a multi-target multi-camera (MTMC) vehicle tracking system based on
the evaluation of the similarity between vehicle tracklets. In addition, a novel component, Candidates
Intersection Ratio (CIR), is proposed to refine the similarity. It provides an associate scheme to build
the multi-camera tracking results as a tree structure. Based on these components, an end-to-end
vehicle tracking system is proposed. The experimental results demonstrate that an 11% improvement
on the evaluation score is obtained compared to the conventional similarity baseline.
Keywords: vehicle tracking; multiple camera; tracklet similarity; deep learning
1. Introduction
With the recent advancement of computer vision, city-scale automatic traffic manage-
ment is now possible. Real-time multi-target multi-camera (MTMC) vehicle tracking can
be improved by techniques for automatic traffic monitoring and management [1–6]. Auto-
matic video analytics can enhance traffic infrastructure design and congestion handling
through the pervasively deployed traffic cameras.
Real-time multi-target multi-camera tracking is one of the crucial tasks in traffic
management. Its purpose is to achieve better traffic design and traffic flow optimization
by tracking many vehicles in a network across multiple surveillance cameras, as shown in
Figure 1. Most approaches in MTMC follow the tracking by detection pipeline. Firstly, a
detector is adopted to obtain all vehicle detections. After vehicle detection, a single-camera
tracker needs to form vehicle tracklets of the same vehicle in each view. Then, these vehicle
tracklets are associated across cameras.
There are also several difficulties for the MTMC task. The problems of how to eliminate
unreliable vehicle tracklets and deal with view variations are significant in these tasks.
Large-scale automatic video analytic systems must handle a large variability of vehicle
types and appearances to meet the accuracy and reliability requirements in the real world.
For applications such as vehicle re-identification, large view variations cast a significant
challenge in vehicle re-identification across views. Similarly, how best to perform space–
time vehicle tracklet association across views is important for vehicle counting and traffic
analysis. In addition, images are captured by different cameras. The vehicle may have
different poses and illumination conditions, resulting in different colors of the appearances.
Different weather conditions, such as raining or hazing, make vehicle tracking problems
more challenging.
Existing works [7,8] evaluate the connectivity between tracklets across cameras by
simple Euclidean distance and cosine similarity. However, these metrics are not robust
enough to measure the connectivity in tracklets. Moreover, when one tracklet is associated
Electronics 2022, 11, 1008. https://doi.org/10.3390/electronics11071008 https://www.mdpi.com/journal/electronics