An image texture insensitive method for saliency detection q Avik Hati , Subhasis Chaudhuri, Rajbabu Velmurugan Electrical Engineering Department, Indian Institute of Technology Bombay, Mumbai 400076, India article info Article history: Received 19 July 2016 Revised 31 December 2016 Accepted 2 January 2017 Available online 4 January 2017 Keywords: Saliency Texture suppression Total variation Sparse segmentation Relevance feedback Image matting abstract We propose a texture insensitive, region based image saliency detection algorithm, having excellent detection and localization properties, to obtain salient objects. We use a total variation based regularizer to suppress textures from the image and to make the method invariant to textural variations in the scene. This leads to an image that contains piecewise constant gray valued regions. This texture-free image is sparsely segmented into a small number of regions using the expectation maximization algorithm assuming a Gaussian mixture model. We compute three different saliency measures for every region using its intensity and spatial features. We adopt a relevance feedback mechanism to obtain weights for combining the three saliency measures and obtain the final saliency map. Next we input the thresh- olded saliency map to an image matting technique and extract the salient objects from the image with exact boundaries. Experimental comparisons with existing saliency detection algorithms demonstrate the superiority of the proposed technique. Ó 2017 Elsevier Inc. All rights reserved. 1. Introduction Saliency is a measure of importance of objects or regions in an image (see Fig. 1) or important events in a video scene that capture our attention. The salient regions in an image are different from the rest of the image in certain features (e.g., color or frequency). Sali- ent region detection methods identify important regions in an image so that operations can be performed only on those regions. This reduces complexity in many image and vision applications, which work with large image databases and long video sequences. For example, saliency detection can aid in video summarization, image segmentation, content based image compression, object recognition, image and video quality assessment and progressive image transmission. The reason this problem is extremely challenging is because the notion of saliency is purely subjective, where we try to address the problem of identifying the elements in an image or scene that cap- ture our attention. Humans prefer to look at an image in a broader sense, e.g., we consider images as a set of objects instead of a set of pixels. Furthermore, we prefer to focus on ‘important’ objects while giving less attention to the less important ones. The main challenge here is to define the term ‘important’ in a quantitative sense and subsequently to perform the salient object identification. A review of some state-of-the-art saliency detection methods is presented in [1,2] and we discuss a few relevant ones here. Fre- quency domain approaches have been used to detect saliency by exploiting varying spectral components present in the image. The spectral residual (SR) approach [3] combines phase spectrum with the spectral residual of an image to obtain saliency. The phase Fourier transform (PFT) model [4,5] shows that the spectral resid- ual part contains little information about the image. Inverse Four- ier transform of only the phase spectrum of the image is taken as the saliency map which highlights the boundaries, but the method gets affected by image textures. PFT also fails if the salient region is large or the background is cluttered. The amplitude information which has been neglected in both SR technique and PFT model, is used to obtain a better saliency map in [6,7]. The amplitude spec- trum of the image is lowpass filtered and combined with the phase spectrum to get the saliency map in [6]. The method of Li et al. [7] modulates the phase spectrum using a learned phase filter. Achanta et al. [8] try to retain the salient object boundaries by retaining most of the frequency components in the image. Multiple difference of Gaussians (DoGs) of several narrow passbands are combined to obtain a filter. Saliency of a pixel is computed as the difference between the averaged image and the filtered image. Retention of high frequencies may cause poor result as noise (high frequency) is retained. Ma et al. [9] compute saliency by combining Wavelet transform of different color channels. Fang et al. [10] com- pute saliency in compressed domain using DCT coefficients. A sin- gular value decomposition based approach has been presented in [11] with the assumption that the large singular values correspond http://dx.doi.org/10.1016/j.jvcir.2017.01.007 1047-3203/Ó 2017 Elsevier Inc. All rights reserved. q This paper has been recommended for acceptance by M.T. Sun. Corresponding author. E-mail addresses: avik@ee.iitb.ac.in (A. Hati), sc@ee.iitb.ac.in (S. Chaudhuri), rajbabu@ee.iitb.ac.in (R. Velmurugan). J. Vis. Commun. Image R. 43 (2017) 212–226 Contents lists available at ScienceDirect J. Vis. Commun. Image R. journal homepage: www.elsevier.com/locate/jvci