REVIEW Multi-scale attention vehicle re-identification Aihua Zheng 1 • Xianmin Lin 1 • Jiacheng Dong 1 • Wenzhong Wang 1 • Jin Tang 1 • Bin Luo 1 Received: 18 June 2019 / Accepted: 5 June 2020 Ó Springer-Verlag London Ltd., part of Springer Nature 2020 Abstract Vehicle re-identiﬁcation (Re-ID) aims to match the vehicle images with the same identity captured by the non-overlapping surveillance cameras. Most existing vehicle Re-ID methods focus on effective deep network architectures to extract discriminative features from single-scale images. However, these methods ignored the complementary information from different scales, which is a crucial factor in computer vision tasks. Attention mechanism, a commonly used technique in recognition and detection tasks, can selectively focus on discriminative local cues of the image. In this work, we propose a multi-scale attention framework which jointly considers multi-scale mechanism and attention technique for vehicle Re-ID. Speciﬁcally, we exploit multi-scale mechanism in feature maps, which can acquire more comprehensive representations for fusing global and local cues. Meanwhile, we exploit attention blocks on each scale subnetwork, which aims to mine complementary and discriminative information. We conduct extensive experiments on three vehicle datasets, VeRi-776, VehicleID and PKU-VD. The promising results demonstrate the effectiveness of the proposed method and yield to a new state of the art for vehicle Re-ID. Keywords Vehicle re-identiﬁcation  Multi-scale  Attention 1 Introduction Vehicle re-identiﬁcation (Re-ID) is to verify whether vehicle shot in one camera appears in other non-overlap- ping cameras. It is of increasing importance in computer vision task due to the wide range of potential applications such as cross-camera tracking, intelligent monitoring and urban surveillance. Although license plates are unique identities for vehicles, their applications in uncontrolled urban surveillance are limited since the current LPR (li- cence plate recognition) techniques are struggling in such complex environments where low-quality images, arbitrary viewpoints, motion blur, poor lighting conditions are per- vasive. Therefore, vehicle Re-ID approaches mainly devote to exploring the vehicle appearance information. Similar to the person Re-ID, vehicle Re-ID suffers from many chal- lenges due to the viewpoint and illumination changes, occlusion, which bring large appearance variations for the same identity across different cameras, as shown on the top three rows in Fig. 1. Furthermore, vehicle Re-ID has its particular challenge: different identities may have similar or even the same appearance especially for the vehicles with the same model from the same manufacturer, as shown at the bottom row in Fig. 1. Recently, deep learning has been applied in numerous computer vision problems such as object detection [5], object recognition [3, 43], data representation [10]. A lot of vehicle Re-ID methods based on CNN networks [19, 44, 47] have been developed recently. They mainly focus on either designing new network architectures to learn more discriminative features or introducing extra information to boost the performance of the vehicle Re-ID Xianmin Lin and Jiacheng Dong have been contributed equally to this paper. & Wenzhong Wang wenzhong@ahu.edu.cn Aihua Zheng ahzheng214@ahu.edu.cn Xianmin Lin xmlin1995@gmail.com Jiacheng Dong jiachengdong@foxmail.com Jin Tang tj@ahu.edu.cn Bin Luo luobin@ahu.edu.cn 1 School of Computer Science and Technology, Anhui University, Hefei 230601, China 123 Neural Computing and Applications https://doi.org/10.1007/s00521-020-05108-x