1070-986X (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MMUL.2019.2915078, IEEE MultiMedia IEEE MULTIMEDIA 1 Modification of Gradient Vector Flow using Directional Contrast for Salient Object Detection Gargi Srivastava and Rajeev Srivastava Abstract—Scene analysis is a relevant research field for its several applications in the area of computer vision. This paper attempts to analyze scene information present in the image by augmenting salient object information with background infor- mation. The salient object is initially identified using a method called as Minimum Directional Contrast (MDC). The underlying assumption behind using this method for defining salient objects is that salient pixels have higher minimum directional contrast than non-salient pixels. Finding MDC provides us with a raw salient metric. The Gradient Vector Flow (GVF) model of image segmentation inculcates the raw saliency information. The gradient of MDC is calculated and added to the data term of the energy functional of GVF so that the contour formation utilizes not only edge formation but also saliency information. The result obtained gives us not only the salient object but also added background information. Three public datasets have been used to evaluate the results obtained. The comparative study of the proposed method for salient object detection with other state-of-the-art methods available in the literature is presented in terms of precision, recall, and F1-Score. Index Terms—Salient object, contrast, gradient vector flow. I. I NTRODUCTION S ALIENT Salient object detection is the procedure of locating important objects in an image. Humans do not require extra effort as this is a straightforward task for them. But for machines, it is a difficult task. There are a variety of object categories, and different patterns lie within each object category. Modeling the human attention mechanism is one way to understand how we can build machines that recognize salient pixels in an image. It can be used for better content- based image retrieval [2] & better image tagging [2]. The researchers rely upon the use of priors for generating saliency maps. Different priors include- local and global contrast priors, edge prior, backgroundness prior, center prior and focusness prior [3]. In recent years, researchers have proposed finding object using contours by first exploiting random forest to find patch rarities and the similarity among them. Research in using eye-fixation prediction and superpixel methods are also used to detect salient objects. Some authors have studied finding salient structures in image and then narrowing down the structure to salient objects using contours. Salient object detection studies also include use of hierarchical contours. Recently, various deep learning methods are used for the G. Srivastava is with the Department of Computer Science and Engineering, Indian Institute of Technology (B.H.U), Varanasi, Uttar Pradesh, 221005 India e-mail: gargis.rs.cse16@iitbhu.ac.in. R. Srivastava is with Indian Institute of Technology (B.H.U), Varanasi. Manuscript received April 16, 2019 purpose of salient object detection. Liu et al. in their work [4] infer local contrast, global contrast and top-down visual factors as saliency cues by training a multi-resolution con- volutional neural network using eye fixation data. In [5], Hou et al. modify the Holistically-Nested Edge Detector with short connections so that it can take advantage of multi-level and multi-scale feature for appropriate object segmentation. Liu and Han [6], introduce long short term memory model and scene context into the deep saliency network along with local cues. Wang and Shen [7] propose to fuse hierarchical saliency information with global saliency information and local response using a skip-layer structure convolutional neural network. In this paper, an attempt is made to utilize directional contrast for providing a raw saliency metric. These saliency cues are fused in the energy functional of gradient vector flow to obtain the final result which contains our salient object. This work emphasizes the use of color contrast information present in the image. The global contrast calculates the con- trast information concerning the complete image. Huang and Zhang [1] in their work utilized the spatial information from the contrast cues. The paper mentions that the background surrounds a foreground object from all directions. Hence, a foreground object has a high contrast in all directions. A background object has relatively less contrast in at least one direction as it has to connect to the background. Thus, the minimum directional contrast of a foreground object is higher than that of the background. This information is used to generate the raw saliency metric. The novelty of our work lies in exploiting in the gradient information present in this map to help the gradient vector flow [8] segment out salient objects. After obtaining the raw saliency map and its gradient compo- nents, the next step is to use this information in the gradient vector flow model to carve out the required salient object. Gradient vector flows are used in image segmentation as they provide better image segmentation in terms of conver- gence [10]. This work uses MDC for generating raw saliency information as it includes spatial information for contrast details. The GVF snake method uses the vector field which is distributed spatially in an image. Thus, the MDC information fuses appropriately throughout the image with the vector field. The calculation of this metric is straightforward and quick. Also, the basic assumption of MDC is simple to comprehend. These are the major factors which motivated the use of MDC with GVF over other existing saliency methods. The main contributions of this paper are as follows: