The Encoding of Complex Visual Stimuli by a Canonical Model of the Primary Visual Cortex: Temporal Population Code for Face Recognition on the iCub Robot Andre Luvizotto, C´ esar Renn´ o-Costa, Ugo Pattacini and Paul Verschure Abstract— The connectivity of the cerebral cortex is char- acterized by dense local and sparse long-range connectivity. It has been proposed that this connection topology provides a rapid and robust transformation of spatial stimulus information into a temporal population code (TPC). TPC is a canonical model of cortical computation whose topological requirements are independent of the properties of the input stimuli and, therefore, can be generalized to the processing requirements of all cortical areas. Here we propose a real time implementation of TPC for classifying faces, a complex natural stimuli that mammals are constantly confronted with. The model consists of a network comprising a primary visual cortex V1 network of laterally connected integrate-and-ﬁre neurons implemented in the humanoid robot platform iCub. The experiment was performed using human faces presented to the robot under different angles and position of light incidence. We show that the TPC-based model can recognize faces with a correct ratio of 97 % without any face-speciﬁc strategy. Additionally, the speed of encoding is coherent with the mammalian visual system suggesting that the representation of natural static visual stimulus is generated based on the combined temporal dynamics of multiple neuron populations. Our results provides that, without any input dependent wiring, TPC can be efﬁciently used for encoding local features in a high complexity task such as face recognition. I. INTRODUCTION The mammalian brain has great abilities of recognizing objects under widely varying conditions. To perform this task, the visual system must solve the problem of building invariant representations of the objects that are the sources of the available sensory information. Since the breakthrough work on V1 by Hubel and Wiesel [1] there has been an increasing number of models of the visual cortex trying to reproduce the performance of natural vision systems. A number of these models has been developed to repro- duce characteristics of the visual system such as invariance to shifts in position, rotation, and scaling. Most of these models are based on the, so called, Neocognitron [2] [3] [4] [5], This work was supported by EU FP7 projects GOAL-LEADERS (FP7- ICT-97732) and EFAA (FP7-ICT-270490). Andre Luvizotto and Cesar Renn´ o-Costa are with Synthetic Percep- tive, Emotive and Cognitive Systems - SPECS at Universidad Pom- peu Fabra - UPF, Roc Boronat, 138 - 08018 - Barcelona, Spain. (andre.luvizotto, cesar.renno)@upf.edu Ugo Pattacini are with Robotics Brain and Cognitive Sciences Depart- ment, Istituto Italiano di Tecnologia - IIT, Via Morego, 30 - 16163 - Genova, Italy. ugo.pattacini@iit.it Paul Verschure is with Synthetic Perceptive, Emotive and Cognitive Systems - SPECS at Universidad Pompeu Fabra - UPF and Instituci´ o Catalana de Recerca i Estudis Avanats - ICREA, Roc Boronat, 138 - 08018 - Barcelona, Spain. paul.verschure@upf.edu a hierarchical multilayer network of detectors with varying feature and spatial tuning. In this classical framework invariant representations emerge in the form of activity patterns at the highest level of the network by virtue of the spatial averaging across the feature detectors at the preceding layers. The recognition process is based on a large dictionary of features stored in memory where model neurons in the different layers act as ﬁlters tuned to speciﬁc features or feature combinations. In this approach invariances to, for instance, position, scale and orientation, are achieved at the high cost of increasing the number of connections between the layers of the network. In the last years a novel model of object recognition has been proposed based on the speciﬁc connectivity template found in the visual cortex combined with the temporal dynamics of neural populations. The, so called, Tempo- ral Population Code, or TPC, emphasizes the property of densely coupled networks to rapidly encode the geometric organization of its inputs into the temporal structure of its population response [6] [7]. In the TPC architecture the spatial information provided by an input stimulus is transformed into a temporal representation. The encoding is deﬁned by the spatially averaged spikes of a population of integrate and ﬁre neurons over a certain time window. The TPC architecture has an advantage in its use of connectivity when compared to pure hierarchical models. It is wire-independent and thus allows to multiplex information. The representation generated by TPC can be used as a generic framework for object recognition tasks regardless of input. Moreoever, the TPC provides a high-capacity encoding and can generalize to realistic tasks as handwritten character classiﬁcation [6]. In addition, it has been shown that the TPC can provide a representational substrate that can naturally account for the formation of place ﬁelds in simulated robots [8]. With TPC the encoding process and the invariances that it can capture are controlled by the topology of lateral connectivity and its transmission delays. In this respect TPC emphasizes that the neo-cortex exploits speciﬁc network topologies and symmetries. It is an opposite notion to the randomly connected recurrent networks found in examples of, so called, reservoir computing or liquid state machines [9]. The TPC is based on the notion that the neo-cortex is characterized by dense local connectivity where it is estimated that only a few percentage of synapses that make    ,((( 313 Proceedings of the 2011 IEEE International Conference on Robotics and Biomimetics December 7-11, 2011, Phuket, Thailand