Attending to visual motion John K. Tsotsos a,b, * , Yueju Liu a,b , Julio C. Martinez-Trujillo c , Marc Pomplun d , Evgueni Simine a,b , Kunhao Zhou a,b a Department of Computer Science and Engineering, York University, Toronto, Canada b Centre for Vision Research, York University, Toronto, Ont., Canada M3J 1P3 c Department of Physiology, McGill University, Montreal, Canada d Department of Computer Science, University of Massachusetts, Boston, MA, USA Received 6 February 2004; accepted 5 October 2004 Available online 26 July 2005 Abstract Visual motion analysis has focused on decomposing image sequences into their component features. There has been little success at re-combining those features into moving objects. Here, a novel model of attentive visual motion processing is presented that addresses both decomposition of the signal into constituent features as well as the re-combination, or binding, of those features into wholes. A new feed-forward motion-processing pyramid is presented motivated by the neurobiology of primate motion processes. On this structure the Selective Tuning (ST) model for visual attention is demonstrated. There are three main contributions: (1) a new feed-forward motion processing hierarchy, the first to include a multi-level decom- position with local spatial derivatives of velocity; (2) examples of how ST operates on this hier- archy to attend to motion and to localize and label motion patterns; and (3) a new solution to the feature binding problem sufficient for grouping motion features into coherent object motion. Binding is accomplished using a top-down selection mechanism that does not depend on a single location-based saliency representation. Ó 2005 Elsevier Inc. All rights reserved. Keywords: Attention; Visual motion analysis; Feature binding; Selective tuning; Affine motion 1077-3142/$ - see front matter Ó 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.cviu.2004.10.011 * Corresponding author. Fax: +1 416 736 5857. E-mail address: tsotsos@cs.yorku.ca (J.K. Tsotsos). URL: http://www.cs.yorku.ca/~tsotsos (J.K. Tsotsos). www.elsevier.com/locate/cviu Computer Vision and Image Understanding 100 (2005) 3–40