1051-8215 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2019.2963721, IEEE Transactions on Circuits and Systems for Video Technology IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1 SDL: Spectrum-Disentangled Representation Learning for Visible-Infrared Person Re-identiﬁcation Kajal Kansal, Member, IEEE, A.V. Subramanyam, Member, IEEE, Zheng Wang, Member, IEEE, Shin’ichi Satoh, Member, IEEE Abstract—Visible-infrared person re-identiﬁcation (RGB-IR ReID) is extremely important for the surveillance applications under poor illumination conditions. Since the difference in the feature representations not only lies in the person’ pose, viewpoint or illumination variations, but also comes from huge spectrum discrepancy, the task becomes practically very challenging. Ex- isting RGB-IR ReID models focus on bridging the gap between RGB and IR images through shared feature embedding, subspace learning or via adversarial learning. However, these methods do not explicitly disregard the spectrum information which is otherwise irrelevant for ReID. Further, adversarial learning methods has less promising convergence. This motivates us to design a non-adversarial and fast disentanglement method to disentangle the spectrum information while learning the identity discriminative features. To extract these features, we propose a novel network with disentanglement loss which can distill identity features and dispel spectrum features. Our network has two branches, spectrum dispelling and spectrum distilling branch. On spectrum dispelling branch, we apply identiﬁcation loss to learn the identity related and spectrum disentangled features. On spectrum distilling branch, we apply an identity-dispeller loss to fool the identity classiﬁer so that it primarily learns spectrum related information. The entire network is trained in an end-to-end manner, which minimizes spectrum information and maximizes invariant identity relevant information at spectrum dispelling branch. Extensive experiments on existing datasets demonstrate the superior performance of our approach compared to the state-of-the-art. Index Terms—Disentanglement, Person Re-identiﬁcation, Surveillance I. I NTRODUCTION Person re-identiﬁcation (ReID) addresses the problem of matching people across disjoint camera views. It has gained much attention in the recent past due to its importance in surveillance related tasks. Most of the current ReID methods focus on RGB images where both probe and gallery samples are RGB images (called RGB-RGB ReID) [1], [2], [3], [4], [5], [6]. In RGB-RGB ReID works, colour information is one of the most important appearance cue for re-identifying a person. This work was supported in part by NII International Internship Program, in part by DST Govt. of India under Grant ECR/2018/002449, in part by Grant-in-Aid for JSPS Fellows under Grant 18F18378, and in part by JST CREST under Grant JPMJCR1686. (Corresponding author: Zheng Wang) Kajal Kansal and A.V. Subramanyam are with Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, Delhi, India. (e-mail: kajal@iiitd.ac.in; subramanyam@iiitd.ac.in) Z. Wang and S. Satoh are with the Digital Content and Media Sciences Research Division, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan. (e-mail: wangz@nii.ac.jp; satoh@nii.ac.jp) RGB IR Shared subspace Baseline method Our method Figure 1. Problem formulation and the goal of Spectrum Disentangled representation learning (SDL). x RGB and x IR represents shared features, v RGB denotes the RGB spectrum features, v IR represents the IR spectrum features and u denotes the spectrum-disentangled representation. The goal is to remove spectrum related information v RGB and v IR , respectively from x RGB and x IR to learn u. Due to this, RGB-RGB ReID can be limited in surveillance and may not capture reasonable appearance information under poor illumination conditions. In such cases, imaging devices without relying on visible light such as infrared imaging should be deployed. This requires to perform visible to infrared (RGB-IR) ReID which is more complicated as in addition to RGB-RGB ReID challenges, the spectrum challenge also needs to be addressed. Only few ReID works deal with this kind of cross-modality problem. Recent works [7], [8], [9], [10] evaluate one stream and two-stream neural networks to deal with RGB-IR ReID problem. A common approach is to learn a new feature subspace where inter and intra modalities of same identities are closer to each other, whereas, the features from different identities are farther apart. In [7], authors propose adversarial learning approach. However, the method has very slow convergence. This motivates us to design a fast disentanglement method to address the cross-modality challenges. Our proposed method does not use adversarial learning techniques, converges quickly and obtains strong discriminative identity-related features while disentangling the spectrum information. To better understand RGB-IR ReID problem and the goal