On Learning of Sigmoid Neural Networks KAYVAN NAJARIAN Computer Science Department, University of North Carolina at Charlotte 9201 University City Boulevard, Charlotte, NC, 28223 Received November 8, 2000; revised May 4, 2001; accepted May 4, 2001 The Probably Approximately Correct (PAC) learning theory creates a framework to assess the learning properties of static models for which the data are assumed to be independently and identically distributed (i.i.d.). One important family of dynamic models to which the conventional PAC learning can not be applied is nonlinear Finite Impulse Response (FIR) models. The present article, using an extension of PAC learning that covers learning with m-dependent data, the learning properties of FIR modeling with sigmoid neural networks are evaluated. These results include upper bounds on the size of the data set required to train FIR sigmoid neural networks, provided that the input data are uniformly distributed.  2001 John Wiley & Sons, Inc. I. INTRODUCTION I n a modeling procedure an unknown function, f is to be estimated to the prespecified values of accuracy  and statistical confidence (1  ). In order to perform the estimation, based on a set of input–output training data, an approximator function h is used to model f. The modeling of an unknown system f with a feedforward neural network h can be considered as a typical example of this procedure. The Probably Approximately Correct (PAC) learning theory, proposed by Valiant [1], deals with the accuracy and confi- dence of the above-mentioned modeling task. PAC learning and other similar learning schemes allow quantitative evaluation of the learning properties of modeling proce- dures in which the data are independently and identically distributed (i.i.d.) in accordance with a probability measure P. The available results in PAC learning theory are only for i.i.d. cases because they make use of Hoeffding’s inequality [2], which is applicable only to i.i.d. data. However, in many real modeling procedures, the assumption of data being i.i.d. is clearly violated. As indicated in Campi and Kumar [3], one important group of applications to which the results of learning theory with independent data are not directly applicable is Nonlinear Finite Impulse Response (NFIR) modeling, where the output depends on the present as well as the past inputs. As a result, in an NFIR model, the inputs at times t and t + 1 are correlated and consequently depen- dent. The importance of FIR models comes from the fact that in many practical cases, dynamic systems can be effi- ciently approximated by appropriate FIR models. The prob- lem of distribution-free learning of linear FIR models trained with the least square algorithm has been addressed by Weyer et al. [4]. They use the notion of Vapnik- Chervonenkis dimension to bound the sample complexity of a linear FIR model. Recently, an extension of the PAC learning theory (called PAC learning with m-dependent data) was presented that includes modeling of an FIR system as a learning task [5,6,7]. Among different families of FIR models, neural mod- els are known to be among the most efficient models. Dur- ing the last couple of decades, neural models have gained a great amount of popularity in different fields of science and engineering (see for example, Grossberg [8]). The main ob- © 2001 John Wiley & Sons, Inc., Vol. 6, No. 4 COMPLEXITY 39