Automatic Interesting Object Extraction From Images Using Complementary Saliency Maps Haonan Yu 1 , Jia Li 2,3 , Yonghong Tian 1 , Tiejun Huang 1 1 National Engineering Laboratory for Video Technology (NELVT), School of EE & CS, Peking University 2 Key Lab of Intell. Info. Process, Inst. of Comput. Tech., Chinese Academy of Sciences, China 3 Graduate University of Chinese Academy of Sciences, China {hnyu,yhtian,tjhuang}@pku.edu.cn,jli@jdl.ac.cn ABSTRACT Automatic interesting object extraction is widely used in many image applications. Among various extraction approaches, sa- liency-based ones usually have a better performance since they well accord with human visual perception. However, nearly all existing saliency-based approaches suffer the integrity problem, namely, the extracted result is either a small part of the object (referred to as sketch-like) or a large region that contains some redundant part of the background (referred to as envelope-like). In this paper, we propose a novel object extraction approach by integrating two kinds of “complementary” saliency maps (i.e., sketch-like and envelope-like maps). In our approach, the extrac- tion process is decomposed into two sub-processes, one used to extract a high-precision result based on the sketch-like map, and the other used to extract a high-recall result based on the envelope-like map. Then a classification step is used to extract an exact object based on the two results. By transferring the complex extraction task to an easier classification problem, our approach can effectively break down the integrity problem. Experimental results show that the proposed approach outperforms six state-of- art saliency-based methods remarkably in automatic object ex- traction, and is even comparable to some interactive approaches. Categories and Subject Descriptors I.4.6 [Image Processing and Computer Vision]: Segmentation – Pixel classification. General Terms Algorithms, Experimentation Keywords Automatic object extraction, Complementary saliency maps, Pixel classification 1. INTRODUCTION In recent years the number of digital images has grown dramat- ically. In these images, the truly meaningful parts may be just a small proportion. The nontrivial contents, usually in the form of interesting objects, are sufficient to represent the semantic mean- ings in most cases and consequently play an important role in many image applications such as content-based retrieval. Therefore, many methods have been proposed to automatically extract interesting objects. For example, graph-theoretic ap- proaches make use of energy function optimization to solve the extraction problem (e.g., [1, 2]); edge-linking methods, such as [3], connect a subset of the fragments produced by edge detection to form a closed contour for the interesting object, etc. Although these approaches work well in some cases, the tendency to solve the extraction problem with little consideration of human visual perception makes them have undesirable performance under some complicated conditions such as in cluttered images. Because visual saliency well accords with human visual per- ception and can be used as one sort of selection mechanisms of the important content, saliency-based approach is proposed re- cently as an alternative for object extraction. For example, Itti et al. [4] combined multiscale features into a single topographical saliency map and adopted a dynamical neural network to select the attended areas that roughly contained the interesting objects. Ma and Zhang [5] generated a contrast-based saliency map and extracted objects by fuzzy growing. Achanta et al. [6] outputted a frequency-tuned saliency map and binarized it with an adaptive threshold. Hou and Zhang [7] constructed the saliency map by analyzing the log-spectrum of the image and used a simple thre- shold to detect pro-objects. Although these approaches work well to simulate human visual perception, their results usually lack integrity and exactness. That is, the result is either a small part of the object or a large region that contains some redundant part of the background. According to the definition of visual saliency, a region with a higher contrast to its surrounding will be more likely to stand out in the saliency map. This gives rise to dark center areas and over-highlighted edges on a large object (referred to as sketch-like), or leads to the redundant detection of local sudden changes in background as a highlighted part (referred to as envelope-like). To solve this problem, we propose a novel interesting object extraction approach using two saliency maps. The two maps, in a complementary manner, are a sketch-like and an envelope-like saliency maps. We simply decompose the extraction process into two sub-processes. The results of the two sub-processes are also somewhat complementary in the sense of exactness, with a high precision and a high recall respectively. We then use the two results as prior knowledge and adopt a simple method for pixel classification. By transferring the complex object extraction task Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM10, October 25-29, 2010, Firenze, Italy. Copyright 2010 ACM 978-1-60558-933-6/10/10...$10.00. 891