1 DVHN: A Deep Hashing Framework for Large-scale Vehicle Re-identiﬁcation Yongbiao Chen * , Sheng Zhang † , Fangxin Liu * , Chenggang Wu * , Kaicheng Guo * , and Zhengwei Qi * * School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China † University of Southern California, Los Angeles, USA Abstract—Vehicle re-identiﬁcation, which seeks to match query vehicle images with tremendous gallery images , has been gathering proliferating momentum. Conventional methods generally perform re-identiﬁcation tasks by representing vehicle images as real-valued feature vectors and then ranking the gallery images by computing the corresponding Euclidean distances. Despite achieving remarkable retrieval accuracy, these methods require tremendous memory and computation when the gallery set is large, making them inapplicable in real-world scenarios. In light of this limitation, in this paper, we make the very ﬁrst attempt to investigate the integration of deep hash learning with vehicle re-identiﬁcation. We propose a deep hash-based vehicle re-identiﬁcation framework, dubbed DVHN, which substantially reduces memory usage and promotes retrieval efﬁciency while reserving nearest neighbor search accuracy. Concretely, DVHN directly learns discrete compact binary hash codes for each image by jointly optimizing the feature learning network and the hash code generating module. Speciﬁcally, we directly constrain the output from the convolutional neural network to be discrete binary codes and ensure the learned binary codes are optimal for classiﬁcation. To optimize the deep discrete hashing frame- work, we further propose an alternating minimization method for learning binary similarity-preserved hashing codes. Exten- sive experiments on two widely-studied vehicle re-identiﬁcation datasets- VehicleID and VeRi- have demonstrated the superiority of our method against the state-of-the-art deep hash methods. DVHN of 2048 bits can achieve 13.94% and 10.21% accuracy improvement in terms of mAP and Rank@1 for VehicleID (800) dataset. For VeRi, we achieve 35.45% and 32.72% performance gains for Rank@1 and mAP, respectively. Index Terms— deep hash ing, ve hi cle re - iden ti ﬁ ca tion, ap prox i - mate near est neigh bor search, deep learn ing I. I NTRODUCTION Vehicle re-identiﬁcation (vehicle ReID)[1][2][3][4][5][6] has been receiving growing attention among the computer vision research community. It targets retrieving the corre- sponding vehicle images in the gallery set given a query image. The general re-identiﬁcation framework consists of two module: feature learning and metric learning. 1) the feature learning module is responsible for extracting discriminative feature embedding from the vehicle image. 2) the distance metric learning module [7] is for preserving the distances of original images in the embedding space. Previous vehicle re-identiﬁcation methods have achieved pronounced perfor- mances on widely-studied research datasets. Nonetheless, it is not feasible to apply these techniques directly into a real- world scenario where the gallery image set normally contains an astronomical amount of images. The main reasons are demonstrated as follows. First, since existing methods learns a real- valued feature vector for each image, the memory storage cost could be exceedingly high when there exists a large number of images. For instance, storing a 2048-dimensional feature vector of data type ‘ﬂoat64’ takes up 16 kilobytes. For a gallery set of 10 million vehicle images, the total memory storage cost could be up to 150 gigabytes. On top of that, directly computing the similarity between two 2048- dimensional feature vectors is quite inefﬁcient[8] and costly, making it undesirable when the query speed is a critical concern. Recently, substantial research efforts have been devoted to deep learning-based hash methods owing to their low storage cost and high retrieval efﬁciency. The goal of deep hash is to learn a hash function that embeds images into compact binary hash codes in the hamming space while preserving their similarity in the original space[9][10][11][8]. Since a 2048-bit hamming code only takes up 256 bytes in memory, the total storage for a dataset of 10 million images is less than 2.4 gigabytes, saving up to 147 gigabytes of memory compared to using real-valued feature vectors. Further, as stated in [12], the computation of Hamming distance between binary hash codes can be accelerated by using the built- in CPU hardware instruction-XOR. Generally, the hamming distance computation could be completed with several machine instructions, signiﬁcantly faster than computing its euclidean distance counterpart. Captivated by the before-mentioned beneﬁts, one may nat- urally ponder the possibility of directly applying the off-the- shelf deep hash techniques into addressing the large-scale vehicle ReID problem. Although researchers have success- fully applied deep hashing in the image retrieval, the unique features of the ReID task make it non-trivial to apply these methods directly, usually with notable performance drops. The degraded performance could be ascribed to the fact that general-purposed deep hash methods fail to learn robust and discriminative features of vehicle ReID datasets. For instance, canonical deep hash methods[11][13] adopt a convolutional neural network to learn features and a pairwise loss module to guide the generation of hamming code. In a scenario where there are thousands, even millions of vehicle ids, and where viewpoint variation for each vehicle identity is considerable, these off-the-shelf deep hash methods can only learn sub- optimal feature representation, leading to sub-optimal hash codes. In this paper, we introduce a novel deep hash-based frame- work for efﬁcient large-scale vehicle ReID, dubbed DVHN, arXiv:2112.04937v1 [cs.CV] 9 Dec 2021