Vol.:(0123456789) SN Computer Science (2020) 1:200 https://doi.org/10.1007/s42979-020-00204-0 SN Computer Science ORIGINAL RESEARCH Saliency from High‑Level Semantic Image Features Aymen Azaza 1,2 · Joost van de Weijer 2 · Ali Douik 1 · Javad Zolfaghari 2 · Marc Masana 2 Received: 10 April 2020 / Accepted: 22 May 2020 / Published online: 11 June 2020 © Springer Nature Singapore Pte Ltd 2020 Abstract Top-down semantic information is known to play an important role in assigning saliency. Recently, large strides have been made in improving state-of-the-art semantic image understanding in the felds of object detection and semantic segmenta- tion. Therefore, since these methods have now reached a high-level of maturity, evaluation of the impact of high-level image understanding on saliency estimation is now feasible. We propose several saliency features which are computed from object detection and semantic segmentation results. We combine these features with a standard baseline method for saliency detec- tion to evaluate their importance. Experiments demonstrate that the proposed features derived from object detection and semantic segmentation improve saliency estimation signifcantly. Moreover, they show that our method obtains state-of-the- art results on (FT, ImgSal, and SOD datasets) and obtains competitive results on four other datasets (ECSSD, PASCAL-S, MSRA-B, and HKU-IS). Keywords Saliency · Object detection · Semantic segmentation Introduction Saliency is the quality of objects that makes them stand out with respect to others, thereby grabbing the attention of the viewer. Computational saliency can be roughly divided in three main research branches. Firstly, it is originally defned as a task of predicting eye-fxations on images [11]. Sec- ondly, researchers use the term to refer to salient object estimation or salient region detection [6, 35, 65]. Here, the task is extended to identify the region, containing the sali- ent object, which is a binary segmentation task for salient object extraction. Thirdly, more recently researchers on convolutional neural networks have also used the term of saliency map to refer to the activations of certain intermedi- ate layers of the network. The focus in this paper is on sali- ent object estimation, and we do not perform fxation map prediction, nor study the activation maps of neural networks. Computational salient object detection aims to detect the most attractive objects in the image in a manner which is coherent with the perception of the human visual system. Visual saliency has a wide range of applications such as image retargeting [15], image compression [51], and image retrieval [61]. Initially, most saliency models were bottom-up approaches which are based on low-level features which are merged using linear and nonlinear fltering to get the fnal saliency map [6, 9]. Itti et al. [22] propose one of the frst models for computational visual saliency which is based on the integration theory of Treisman [52] and uses several low- level bottom-up features including color, orientation, and intensity. Even though this method has been surpassed on popular baselines by many approaches, a recent study which optimized all its parameters found that it could still obtain results comparable to state-of-the-art [17]. Yang et al. [58] improve low-level features by considering their contrast with respect to the boundary of the image. Here, the boundary is used to model the background. Then, the saliency map is computed using graph-based manifold ranking. Perazzi et al. * Aymen Azaza aymen.azaza@cvc.uab.es Joost van de Weijer joost@cvc.uab.es Ali Douik alidouik@gmail.com Javad Zolfaghari jzolfaghari@cvc.uab.es Marc Masana mmasana@cvc.uab.es 1 National Engineering School of Sousse, University of Sousse, Pole technologique de Sousse, Sousse, Tunisia 2 Computer Vision Center, Barcelona, Spain