Learning efference in CNNs for perception-based navigation control P. Arena † , L. Fortuna † , M. Frasca † , D. Lombardo † and L. Patan´ e † †Dipartimento di Ingegneria Elettrica Elettronica e dei Sistemi, Universit` a degli Studi di Catania Viale Andrea Doria 6, 95125, Catania, Italy Email: parena, lfortuna, mfrasca, dlombardo, lpatane@diees.unict.it Abstract—Action-oriented perception involves com- plex tasks to be fulfilled in real time. In fact living be- ings, even the most simple, suitably integrate afferent stim- uli, create an abstract, concise representation of environ- ment stimuli and choose it for action-selection purposes. We propose a novel infrastructure, based on CNNs, where the spatial-temporal solutions are linked to the results of the perception stage. In this perspective, a primary role is devoted to the introduction of plasticity to enhance the as- sociation stimuli-CNN dynamics-action selection with ap- plication to the task of autonomous navigation control. 1. Introduction It is known that an organism is able to perceive a set of simple sensory events (US, unconditioned stimuli), each of which automatically triggers a response (UR, uncon- ditioned response) by the nervous system. According to Classical conditioning, [1], the repeated simultaneous pre- sentation of an initially neuter stimulus (CS, conditioned stimulus) and an US is able to build an association between the two stimuli. This allows, after a number of trials, the CS to be able to command a response (CR, conditioned re- sponse), similar to the UR. Operant conditioning ([2], [3]), provides the animal with a further improvement in behav- ior by means of a task-dependent combination of rewarding successful actions and punishing unsuccessful ones. Recently, on the basis of the new behavioral-based robot- ics paradigm [4] and of the neurobiological cues, machine perception research developed several bio-inspired frame- works of perception process, such as models of classical and operant conditioning [5] or a closed loop anticipatory network avoiding reflexes [6]. In this paper we present a new framework for the sensing- perception-action cycle emulating the low-level reflex re- actions with the progressive structuring of a higher-level behavior. This strategy is applied to the simulation of a ro- bot in a random foraging task. The perceptive structure can be divided into functional blocks. Firstly we define a sensing block (afferent layer), which receives sensorial stimuli from the environment and sets the initial conditions for a two layer RD-CNN, which is the core of the perception process. The CNN parameters are chosen to generate Turing patterns, regarded as a kind of internal state for the whole system reflecting the state of the environment. Each pattern is associated with an action Figure 1: Functional block diagram of the implemented frame- work. (efferent layer) by means of a traditional Motor Map (MM) [7]. The Reward Function (RF) plays a key role for the success of the whole strategy. Until now, it was selected a priori, based on design considerations. In this paper, the RF is not defined a priori, but is progressively learned by means of the association between simple and complex sen- sory events. Unlike classical conditioning, every US drives the learning of all the CSs. Once a basic reward function is formed, the reinforcement learning provided by the MM allows the robot to optimize the behavior in relation to the given task, according to the experiments in [2] and [3]. Moreover, a basic difference of our framework in compari- son with [5] and [6] is the introduction of a dynamics in the system implementing the sensing-perception-action. Non- linear dynamical systems are used in place of a static neural network, for reasons of biological plausibility and much improved plasticity. In this paper the sensing-perception- action loop is modelled by using nonlinear dynamical sys- tems like CNNs, exploiting their real-time implementation [8]. 2. The implemented framework The implemented framework is made up of four main blocks (Fig.1): the sensing block, which receives environ- mental stimuli; the perception block, which forms an in- ternal state from sensor input; the action selection block, which triggers an action to the effectors; the Reward Func- tion (RF) block, which evaluates the effectiveness of the actions and contributes to the learning process. In the fol- lowing we will refer to iteration to indicate the set of oper- ations leading to a single robot action. Bruges, Belgium, October 18-21, 2005 Theory and its Applications (NOLTA2005) 2005 International Symposium on Nonlinear 565