Vol.:(0123456789)
SN Computer Science (2020) 1:200
https://doi.org/10.1007/s42979-020-00204-0
SN Computer Science
ORIGINAL RESEARCH
Saliency from High‑Level Semantic Image Features
Aymen Azaza
1,2
· Joost van de Weijer
2
· Ali Douik
1
· Javad Zolfaghari
2
· Marc Masana
2
Received: 10 April 2020 / Accepted: 22 May 2020 / Published online: 11 June 2020
© Springer Nature Singapore Pte Ltd 2020
Abstract
Top-down semantic information is known to play an important role in assigning saliency. Recently, large strides have been
made in improving state-of-the-art semantic image understanding in the felds of object detection and semantic segmenta-
tion. Therefore, since these methods have now reached a high-level of maturity, evaluation of the impact of high-level image
understanding on saliency estimation is now feasible. We propose several saliency features which are computed from object
detection and semantic segmentation results. We combine these features with a standard baseline method for saliency detec-
tion to evaluate their importance. Experiments demonstrate that the proposed features derived from object detection and
semantic segmentation improve saliency estimation signifcantly. Moreover, they show that our method obtains state-of-the-
art results on (FT, ImgSal, and SOD datasets) and obtains competitive results on four other datasets (ECSSD, PASCAL-S,
MSRA-B, and HKU-IS).
Keywords Saliency · Object detection · Semantic segmentation
Introduction
Saliency is the quality of objects that makes them stand out
with respect to others, thereby grabbing the attention of the
viewer. Computational saliency can be roughly divided in
three main research branches. Firstly, it is originally defned
as a task of predicting eye-fxations on images [11]. Sec-
ondly, researchers use the term to refer to salient object
estimation or salient region detection [6, 35, 65]. Here, the
task is extended to identify the region, containing the sali-
ent object, which is a binary segmentation task for salient
object extraction. Thirdly, more recently researchers on
convolutional neural networks have also used the term of
saliency map to refer to the activations of certain intermedi-
ate layers of the network. The focus in this paper is on sali-
ent object estimation, and we do not perform fxation map
prediction, nor study the activation maps of neural networks.
Computational salient object detection aims to detect the
most attractive objects in the image in a manner which is
coherent with the perception of the human visual system.
Visual saliency has a wide range of applications such as
image retargeting [15], image compression [51], and image
retrieval [61].
Initially, most saliency models were bottom-up
approaches which are based on low-level features which are
merged using linear and nonlinear fltering to get the fnal
saliency map [6, 9]. Itti et al. [22] propose one of the frst
models for computational visual saliency which is based on
the integration theory of Treisman [52] and uses several low-
level bottom-up features including color, orientation, and
intensity. Even though this method has been surpassed on
popular baselines by many approaches, a recent study which
optimized all its parameters found that it could still obtain
results comparable to state-of-the-art [17]. Yang et al. [58]
improve low-level features by considering their contrast with
respect to the boundary of the image. Here, the boundary is
used to model the background. Then, the saliency map is
computed using graph-based manifold ranking. Perazzi et al.
* Aymen Azaza
aymen.azaza@cvc.uab.es
Joost van de Weijer
joost@cvc.uab.es
Ali Douik
alidouik@gmail.com
Javad Zolfaghari
jzolfaghari@cvc.uab.es
Marc Masana
mmasana@cvc.uab.es
1
National Engineering School of Sousse, University
of Sousse, Pole technologique de Sousse, Sousse, Tunisia
2
Computer Vision Center, Barcelona, Spain