Temporal Spectral Residual: Fast Motion Saliency Detection Xinyi Cui Computer Science Rutgers University 617 Bowser Road, Piscataway NJ, USA 08854 xycui@cs.rutgers.edu Qingshan Liu Computer Science Rutgers University 617 Bowser Road, Piscataway NJ, USA 08854 qsliu@cs.rutgers.edu Dimitris Metaxas Computer Science Rutgers University 617 Bowser Road, Piscataway NJ, USA 08854 dnm@cs.rutgers.edu ABSTRACT Saliency detection has attracted much attention in recent years. It aims at locating semantic regions in images for further image understanding. In this paper, we address the issue of motion saliency detection for video content analysis. Inspired by the idea of Spectral Residual for image saliency detection, we propose a new method Temporal Spectral Resid- ual on video slices along X - T and Y - T planes, which can automatically separate foreground motion objects from backgrounds, also with the help of threshold selection and voting schemes. Different from conventional background modeling methods with complex mathematical model, the proposed method is only based on Fourier spectrum anal- ysis, so it is simple and fast. The power of our proposed method is demonstrated in the experiments of four typical videos with different dynamic background. Categories and Subject Descriptors I.2.10 [Vision and Scene Understanding]: Video analy- sis General Terms Algorithms, Design, Experimentation, Performance Keywords Motion Saliency Detection, Temporal Spectral Residual, Video Analysis 1. INTRODUCTION Saliency detection in static images has attracted much at- tention in recent years. Different from conventional image segmentation methods that separate the whole scene into discrete parts, saliency detection aims at finding semantic regions and rejecting backgrounds. The idea of saliency de- tection is also similar to human visual system, since the Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM’09, October 19–24, 2009, Beijing, China. Copyright 2009 ACM 978-1-60558-608-3/09/10 ...$10.00. first stage of human vision is fast but simple pre-attentive process. Therefore saliency detection provides a good pre- processing stage for image understanding. Many algorithms have been proposed. For example, Itti and Koch designed a model of simulating the human visual search process to detect saliency in static images[5] [6] [7]; Hou and Zhang [4] proposed a fast Fourier spectrum residual analysis for image saliency detection. It has been demonstrated that saliency detection is helpful for visual recognition tasks in [12] [3]. The semantics of videos is usually dominated by mean- ingfuls foreground motion objects, so how to locate these motion objects from backgrounds is an important issue for video understanding. Although video data has more in- formation than static images, various background motions make it a difficult task for practical applications. A popu- lar way is to first learn a complex background model, and then subtract backgrounds to obtain foreground motion ob- jects. Typical background models include Gaussian Mix- ture Model [10], Nonparametric Kernel Density Estimation [2], Adaptive KDE combined with motion information [8], Bayesian Learning approach [11], Linear Dynamic Model [9], Robust Kalman Filter [13]. These models have achieved good results in some cases, but they need highly computa- tional cost. In this paper, we propose a fast motion saliency detection method Temporal Spectral Residual, inspired by Spectral Residual based image saliency detection [4]. Different from complex background modeling, our proposed method is only based on Fourier spectral analysis, and free of training or initial labeling. As a pro-processing method, it is simple and fast. Our method contains three steps: 1) First per- form Fourier spectral analysis on the temporal slices along both X - T and Y - T planes, and apply Temporal Spectral Residual to automatically separate the salient regions from backgrounds on both planes respectively. 2) A threshold se- lection scheme is adopted to reject noise.3) Refine the results on the X - T and Y - T planes by a voting scheme. The power of proposed method is demonstrated by four typical videos with different background motions. Also our algo- rithm can process 18 fps, given frame size 120 × 160 pixels in Matlab without code optimization on a 2.4GHZ and 8G Memory machine. 2. SPECTRAL RESIDUAL The Spectral Residual algorithm [4] focuses on exploring the properties of the background by exploiting the power of log spectrum. It is based on the observation that log 617