On Learning of Sigmoid Neural Networks
KAYVAN NAJARIAN
Computer Science Department, University of North Carolina at Charlotte 9201 University City Boulevard,
Charlotte, NC, 28223
Received November 8, 2000; revised May 4, 2001; accepted May 4, 2001
The Probably Approximately Correct (PAC) learning theory creates a framework to assess the learning properties
of static models for which the data are assumed to be independently and identically distributed (i.i.d.). One
important family of dynamic models to which the conventional PAC learning can not be applied is nonlinear
Finite Impulse Response (FIR) models. The present article, using an extension of PAC learning that covers
learning with m-dependent data, the learning properties of FIR modeling with sigmoid neural networks are
evaluated. These results include upper bounds on the size of the data set required to train FIR sigmoid neural
networks, provided that the input data are uniformly distributed. 2001 John Wiley & Sons, Inc.
I. INTRODUCTION
I
n a modeling procedure an unknown function, f is to be
estimated to the prespecified values of accuracy and
statistical confidence (1 ). In order to perform the
estimation, based on a set of input–output training data, an
approximator function h is used to model f. The modeling of
an unknown system f with a feedforward neural network h
can be considered as a typical example of this procedure.
The Probably Approximately Correct (PAC) learning theory,
proposed by Valiant [1], deals with the accuracy and confi-
dence of the above-mentioned modeling task. PAC learning
and other similar learning schemes allow quantitative
evaluation of the learning properties of modeling proce-
dures in which the data are independently and identically
distributed (i.i.d.) in accordance with a probability measure
P. The available results in PAC learning theory are only for
i.i.d. cases because they make use of Hoeffding’s inequality
[2], which is applicable only to i.i.d. data. However, in many
real modeling procedures, the assumption of data being
i.i.d. is clearly violated. As indicated in Campi and Kumar
[3], one important group of applications to which the results
of learning theory with independent data are not directly
applicable is Nonlinear Finite Impulse Response (NFIR)
modeling, where the output depends on the present as well
as the past inputs. As a result, in an NFIR model, the inputs
at times t and t + 1 are correlated and consequently depen-
dent. The importance of FIR models comes from the fact
that in many practical cases, dynamic systems can be effi-
ciently approximated by appropriate FIR models. The prob-
lem of distribution-free learning of linear FIR models
trained with the least square algorithm has been addressed
by Weyer et al. [4]. They use the notion of Vapnik-
Chervonenkis dimension to bound the sample complexity
of a linear FIR model.
Recently, an extension of the PAC learning theory (called
PAC learning with m-dependent data) was presented that
includes modeling of an FIR system as a learning task
[5,6,7]. Among different families of FIR models, neural mod-
els are known to be among the most efficient models. Dur-
ing the last couple of decades, neural models have gained a
great amount of popularity in different fields of science and
engineering (see for example, Grossberg [8]). The main ob-
© 2001 John Wiley & Sons, Inc., Vol. 6, No. 4 COMPLEXITY 39