Author's personal copy Structural similarity determines search time and detection probability Alexander Toet ⇑ TNO Human Factors, P.O. Box 23, 3769 ZG Soesterberg, The Netherlands Intelligent System Laboratory Amsterdam, Faculty of Science, University of Amsterdam, Kruislaan 403, 1098 SJ Amsterdam, The Netherlands article info Article history: Received 29 May 2010 Available online 21 September 2010 Keywords: Clutter Structural similarity index (SSIM) Target structure similarity (TSSIM) Detection time Detection probability abstract The recently introduced TSSIM clutter metric is currently the best predictor of human visual search per- formance for natural images (Chang and Zhang [1]). The TSSIM quantifies the similarity of a target to its background in terms luminance, contrast and structure. It correlates stronger with experimental mean search times and detection probabilities than other clutter metrics (Chang and Zhang [1,2]). Here we show that it is predominantly the structural similarity component of the TSSIM which determines human visual search performance, whereas the luminance and contrast components of the TSSIM show no rela- tion with human performance. This result agrees with previous reports that human observers mainly rely on structural features to recognize image content. Since the structural similarity component of the TSSIM is equivalent to a matched filter, it appears that matched filtering predicts human visual performance when searching for a known target. Ó 2010 Elsevier B.V. All rights reserved. 1. Introduction It is well known that visual targets that are similar to their local background or to details in other parts of the scene are harder to find than targets which are highly distinct. This obscuring effect, which is generally known as clutter, determines human visual search and detection performance in electro-optical image to a large extent. Many attempts have been made to quantify the ef- fects of clutter by means of digital clutter metrics. However, since the concept is inherently elusive, attempts to model clutter have only been partly successful [3–6,6–17]. Visual search experiments have shown that detection perfor- mance depends mainly on the energy contrast between a target and its local background, whereas recognition depends mainly on the structural dissimilarity between a target and its surround [18,19]. For complex scenes, the spatial relationships (shape and relative location) of features in an image can have a greater effect on detection than the relative luminance of the features [3]. Higher overall contrast may even reduce the amount of perceived clutter because confusing objects are more readily recognized for what they are – nontarget scene elements. An effective clutter metric should account for this type of cognitive screening. Wang and Bovik introduced the structural image similarity in- dex (SSIM) which measures the similarity between images in terms of luminance, contrast and structure [20–24]. The SSIM has successfully been deployed to model human visual perception of image distortions and modifications in a wide range of different imaging applications (for an overview see [22]). Chang and Zhang [1,2] recently introduced the TSSIM clutter metric, which deploys the SSIM to quantify the similarity of a target to its background in terms luminance, contrast en structure. They showed that the TSSIM correlates significantly with mean search time and detection probability [1,2]. However, it is not immediately obvious to what extent each of the three TSSIM components contributes to this correlation. Here we analyze the predictive performance of each of the three TSSIM components, and we show that it is predominantly the struc- tural similarity component which determines human visual search performance, whereas the luminance and contrast components of the TSSIM show no relation with human performance. The rest of this paper is organized as follows. In Section 2 we show how rewrit- ing the TSSIM in its full form allows the assessment of the contribu- tion of the luminance, contrast and structural similarity components to the overall clutter metric. In Section 3 we describe how the performance of the TSSIM was evaluated by deployment to a set of natural images for which human observer data are avail- able. The results of this experiment are presented in Section 4. Finally, the conclusions of this study are presented in Section 5. 2. Clutter metrics 2.1. The structural similarity (SSIM) index Let x ={x i |i = 1, 2, ... , N} and y ={y i |i = 1, 2, ... , N} represent two discretely sampled grayscale image patches that need to be compared. Let l x , l y , r x , r y , r xy respectively be the mean of x, the 1350-4495/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.infrared.2010.09.003 ⇑ Address: TNO Human Factors, P.O. Box 23, 3769 ZG Soesterberg, The Nether- lands. Tel.: +31 346 356237; fax: +31 346 353977. E-mail address: lex.toet@tno.nl Infrared Physics & Technology 53 (2010) 464–468 Contents lists available at ScienceDirect Infrared Physics & Technology journal homepage: www.elsevier.com/locate/infrared