@ Pergamon PII: S0893-6080(97)00034-8 Neural Networks, Vol.10,No.6,pp.993-1015, 1997 01997Elsevier Science Ltd. Allrights reserved Printed inGreat Britairr 0893-6080/97 $17.00+.00 CONTRIBUTED ARTICLE SCAN: A Scalable Model of Attentional Selection ERIC O. POSTMA,l H. JAAP VAN DEN HERIK1AND PATRICKT.W. HUDSON1’2 ‘Department ofComputer Science, MATRIKS, FacultyofGeneral Sciences, Universiteit Maastricht and‘UnitofExperimental andTheoretical Psychology, LeidenUniversity (Received29August1995;accepted2 December1996) Abstract—Thispaper describes the SCAN (Signal Channeling Attentional Network) model, a scalable neural network model for attentional scanning. The building block of SCAN is a gating lattice, a sparsely-connected neural network de~ned as a special case of the Ising latticefiom statistical mechanics. The process of spatial selection through covert attention is inteqoreted as a biological solution to the problem of translation-invariant pattern processing. In SCAN, a sequence ofpattem translations combines active selection with translation-invariant processing. Selected patterns are channeled through a gating network formed by a hierarchical jiactal structure of gating lattices, and mapped onto an output window. We show how the incorporation of an expectation-generating classtjier network (e.g. Caqoenter and Grossberg’s ART network) into SCAN allows attentional selection to be driven by expectation. Simulation studies show the SCAN model to be capable of attending and identifying object patterns that are part of a realistically sized natural image. 01997 Elsevier Science Ltd. Keywords-Neural networks,Covertattention,Vision,Translationinvariance,Scalablearchitectures,Brain-inspired modelling,Patternrouting,Adaptiveresonancetbeorynetworks. 1. INTRODUCTION Vision is au active process. Observers sample the visual environment dynamically and this activity facilitates interpretation. The way in which this apparently occurs involves overt attention, i.e. eye movements. As illu- strated by Yarbus’s (Yarbus, 1967) studies, the oculomo- tor system scans visual scenes by a series of eye fixations. However, a less apparent selective process also exists; it is not overtly visible and is therefore known as covert attention (Posner, 1980).This process is believed to oper- ate in close concert with the overt-attention system of eye movements by selecting future targets for foveation, i.e. eye fixation (Mackeben & Nakayama, 1993). A widely used metaphor for the covert-attention process is that of a searchlight (Crick, 1984). Covertly attending to part of an image can, according to this metaphor, be likened to a searchlight illuminating that part. The searchlight’s movements may proceed independently of the eye move- ments, as can be easily verified by inspection of Figure 1 Acknowledgements: We tharrkJaapMurreand two anonymous referees fortheirhelpful comments. Requests forreprirrts shouldbcsenttoEricO.Postma, Department of Computer Science,Facultyof GeneratSciences, Universiteit Maas- tricht,P.O.Box616,6200MDMaastricht, TheNetherlands; Tel:31 43 388 3493;Fax: 31 43 3252392;e-mail:POSTMA@CS.UNI- MAAS.NL. 993 (after Anstis, 1974). Fixating the eyes on the centre of the figure effectively prevents eye movements. Nevertheless, by modulating the attentional searchlight, individual characters can be selected at will, rendering them per- fectly perceivable (see Figure 2). We believe that the covert-attention process plays au important role in any effective visual system. Therefore, we aim to reach a formal specification of the process so that it can be applied in the construction of parallel-distributed vision machines. In this context, two central problems can be identified: the binding problem and the scaling problem. Before turning to a description of the model, we discuss both problems and indicate how we deal with them. 1.1. The Binding Problem Invariant object perception represents one of the most difficult computational tasks for artificial-vision systems. The major problem is how to extract or compute objeet representations that are invariant to changes in position (i.e. translation invariance). A common approach to this problem involves the decomposition of the visual image into local features (Fukushima, 1980; Von der Malsburg, 1988; Mozer, 1991). Decomposition into features allows for their integration over large parts of the image so that -.