Pattern Recognition 109 (2021) 107579 Contents lists available at ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/locate/patcog Iterative local re-ranking with attribute guided synthesis for face sketch recognition Decheng Liu a , Xinbo Gao a,d,∗ , Nannan Wang b,∗ , Chunlei Peng c , Jie Li a a State Key Laboratory of Integrated Services Networks, School of Electronic Engineering, Xidian University, Xi’an 710071, Shaanxi, P. R. China b State Key Laboratory of Integrated Services Networks, School of Telecommunications Engineering, Xidian University, Xi’an 710071, Shaanxi, P. R. China c State Key Laboratory of Integrated Services Networks, School of Cyber Engineering, Xidian University, Xi’an 710071, Shaanxi, P. R. China d Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing 400065, China a r t i c l e i n f o Article history: Received 2 August 2019 Revised 25 March 2020 Accepted 4 August 2020 Available online 5 August 2020 Keywords: Face sketch recognition Facial attribute Re-ranking a b s t r a c t Because of the large texture and spatial structure discrepancies between face sketches and photos, face sketch recognition becomes a challenging problem in face recognition community. For example, in law enforcement and security, the speciﬁc face sketch generation process could introduce some inevitable biases which results in poor face sketch recognition performance. In order to mimic the modality gap introduced by the biases during face sketch creation process, the novel iterative local re-ranking with attribute guided synthesis method is proposed for face sketch recognition, which does not require any extra manually annotation or human interaction. The clues of face attributes are utilized to generate im- ages with varying local characteristic from probe sketches, which could help eliminate the unavoidable biases. Considering the special property of face sketches, the iterative local re-ranking algorithm is de- signed to encode the contextual information integrated with local invariant discriminative information for matching sketches with photos. Experimental results on multiple face sketch databases demonstrate that the proposed method achieves superior performances compared with state-of-the-art methods. © 2020 Elsevier Ltd. All rights reserved. 1. Introduction Face recognition is a challenging and important application in computer vision. Recently great progress has been achieved, how- ever there still exist many challenging scenarios in the real world. Especially in the law enforcement agency, there exist many scenes where the mug shot of the suspect is not available or only poor- quality images are captured in video surveillance. Due to the lack of suspects’ photographs, law enforcement agencies have started to generate face sketches according to the description provided by eyewitness or blurry surveillance videos. With technological im- provement, more forensic artists have begun to utilize the gener- ation software to produce composite sketches as the placement of hand-drawn sketches. However, it is due to the complex and spe- cial generation process of face sketches, there always exist shape exaggerations and distortions in face sketches. More importantly, in law enforcement face sketches are utilized to determine the identity of criminals where only the description of eyewitnesses ∗ Corresponding authors. E-mail addresses: dcliu.xidian@gmail.com (D. Liu), gaoxb@cqupt.edu.cn (X. Gao), nnwang@xidian.edu.cn (N. Wang), clpeng@xidian.edu.cn (C. Peng), leejie@mail.xidian.edu.cn (J. Li). is available. Forensic psychology related works [1] prove that face sketch recognition is affected by forgotten memory of eyewit- ness and imperfect communication. In summary, the sketch-photo modality difference, the inaccurate memory of eyewitnesses and the biased communication of memory all bring about unavoid- able biases in the generated face sketches. Thus, face sketch recog- nition remains a diﬃcult and challenging task in the real-world scenario. Existing face sketch recognition methods mostly focus on three aspects: 1) extracting modality invariant features [2–4], which con- tain the identity discriminative information; 2) projecting different modality face images into a latent common space [5–8], where face sketches and gallery photos could be matched directly; 3) trans- forming images in one modality to another modality [9–12], which would make these images in homogeneous scenarios, and then the traditional homogeneous face recognition methods could be directly utilized. However, face sketches in real-world scenes dif- fer from gallery photos because of the avoidable perceptual bias, descriptive bias and generating bias [13]. These inevitable biases caused in the generation procedures could enlarge the gap of dif- ferent modalities. Only transforming sketches and photos in homo- geneous scenarios is not enough to eliminate these biases when matching face sketches. To mimic the modality gap mentioned be- https://doi.org/10.1016/j.patcog.2020.107579 0031-3203/© 2020 Elsevier Ltd. All rights reserved.