A TABU SEARCH ALGORITHM FOR OPTIMAL SIZING OF LOCALLY RECURRENT NEURAL NETWORKS B. Cannas, A. Fanni, M. Marchesi, F. Pilo Dipartimento di Ingegneria Elettrica ed Elettronica – Università di Cagliari Piazza d’Armi 09123 – ITALY cannas@elettro1.unica.it Abstract A general purpose implementation of the Tabu Search metaheuristic, called Universal Tabu Search, is used to optimally design a Locally Recurrent Neural Network architecture. In fact, generally, the design of a neural network is a tedious and time consuming trial and error operation that leads to structures whose optimality is not guaranteed. In this paper, the problem of choosing the number of hidden neurons and the number of taps and delays in the FIR and IIR network synapses is formalised as an optimisation problem whose cost function to be minimised is the network error calculated on a validation data set. The performance of the algorithm have been tested on the difficult task to learn the chaotic behaviour of a non linear circuit proposed by Chua as a paradigm for studying chaos. 1. Introduction Artificial Neural Networks (ANNs) are a very powerful tool and they can be a valid aid in a large number of practical applications like pattern recognition, prediction, optimisation, associative memory and control. Traditional ANNs have neither feedbacks nor delays, and consequently no memory of the past inputs: the output result is strictly a function of the instantaneous input to the network. On the other hand, many practical problems such as, for instance, time series forecasting, digital signal processing, space-temporal pattern recognition or industrial system control and diagnosis, require that the solution takes into account the existing link among the current input and the previous inputs and outputs, because the output depends also on the previous history of the system. Locally Recurrent Neural Networks (LRNN) have synapses with internal memory (finite or infinite) [1,2] which provide better modelling accuracy compared to static neural networks and make them particularly well suited for dynamic applications. LRNNs are made up by units that get as input at generic time instant t the output of the previous level units at time t and at the time t-1, t-2, ..., t-n, and its output at the time t-1, t-2, ..., t-m, all suitably weighted. These delayed inputs let the unit know the history of the signal, allowing the creation of richer and more complex decision surfaces. These networks are sometimes called FIR or IIR networks because each synapse has the structure of a Finite Impulse Response (FIR) digital filter or an Infinite Impulse Response (IIR) digital filter. One of the major difficulties in designing a neural network is the task of determining the actual network structure, such as the number of layers, the number of nodes and the appropriate interconnections. Network performance can radically change when such parameters are modified. Particular attention should be paid in order to solve this problem for LRNNs due to the tremendous number of variables (i.e. number of hidden nodes, number of taps in IIR and FIR synapses) that influence network performance. This problem can be formulated as an optimization problem with integer variables and it could be efficiently solved with metaheuristic techniques which are particularly well suited for combinatorial problems. Recently, a new metaheuristic, called Tabu Search (TS) [3-5], has provided advances for solving difficult optimization problems in many domains [6]. TS is a metaheuristic method that guides the search for the optimal solution making use of flexible memory systems which exploit the history of the search. TS consists of the systematic prohibition of some solutions to prevent cycling and to avoid the risk of trapping in local minima. New solutions are searched in the neighborhood of the current one. The neighborhood is defined as the set of the points reachable with a suitable sequence of local perturbations, starting from the current solution. TS algorithm can be used to optimally design LRNNs. The LRNN can be identified by means of M integers each of them represents the most important parameters of the network (i.e. the number of hidden neurons or the number of delays in the synapses). By so doing the problem becomes discrete and the TS algorithm can be successfully employed. The objective function to be minimized is the minimal error on a validation data set, reached after a complete training epoch. It should be noticed that the optimal neural network found with TS algorithm has great generalization capabilities or, in other words, it is able to make accurate predictions for data not used in the training phase. The optimal dimension allowed a more rapid and stable convergence of the learning algorithm even for really long time series. Moreover, the time consuming manual design phase has been overcome and the software package developed can be used as general purpose package well suited for a wide range of different neural networks applications. In this paper the performance of the proposed approach has been tested on the problem of learning the autonomous behaviour of a chaotic non-linear circuit