Upper Bound on Pattern Storage in Feedforward Networks
Pramod L. Narasimha, Michael T. Manry and Francisco Maldonado
Abstract—Starting from the strict interpolation equations for
multivariate polynomials, an upper bound is developed for the
number of patterns that can be memorized by a nonlinear
feedforward network. A straightforward proof by contradiction
is presented for the upper bound. It is shown that the hidden
activations do not have to be analytic. Networks, trained by
conjugate gradient, are used to demonstrate the tightness of the
bound for random patterns. Based upon the upper bound, small
multilayer perceptron models are successfully demonstrated for
large support vector machines.
I. I NTRODUCTION
Pattern memorization in nonlinear networks has been
studied for many decades. The number of patterns that
can be memorized has been referred to as the information
capacity [2] and storage capacity [18]. Equating network
outputs to desired outputs has been referred to as strict
interpolation [5], [20], [7]. It is important to understand
the pattern memorization capability of feedforward networks
for several reasons. First, the capability to memorize is
related to the ability to form arbitrary shapes in weight
space. Second, if a network can successfully memorize many
random patterns, we know that the training algorithm is
powerful [16]. Third, some useful feedforward networks
such as Support Vector Machines (SVMs), memorize large
numbers of training patterns [10], [11].
Upper bounds on the number of distinct patterns P that
can be memorized by nonlinear feedforward networks are
functions of the number of weights in the network, N
w
,
and the number of outputs, M . For example, Davis [5] has
shown that for any P distinct, complex points there exists a
unique (P −1) degree polynomial, with complex coefficients,
that strictly interpolates (memorizes) all the points. In other
words, breaking up the complex quantities into separate
real and imaginary parts, he has derived a bound for the
M =2 case. An upper bound on the number of hidden
units in the Multilayer Perceptron (MLP) for the M =1
case, derived by Elisseeff and Moisy [6], agrees with the
bound of Davis. Suyari and Matsuba [21] have derived the
storage capacity of neural networks with binary weights,
using minimum distance between the patterns. Cosnard et
al. [4] have derived upper and lower bound on the size
of nets capable of computing arbitrary dichotomies. Ji and
Psaltis [13] have derived upper and lower bounds for the in-
formation capacity of two-layer feedforward neural networks
with binary interconnections, using an approach similar to
that of Baum [3]. Moussaoui [1] and Ma and Ji [17] have
Pramod L. Narasimha and Michael T. Manry are with the Department
of Electrical Engineering, University of Texas at Arlington, Arlington, TX
76013, USA (email: pramod.narasimha@uta.edu; manry@uta.edu).
Francisco Maldonado is with Williams Pyro, Inc., 200 Greenleaf Street ,
Fort Worth , Texas 76107 . (email: javier.maldonado@williams-pyro.com)
pointed out that the information capacity is reflected in the
number of weights of the network.
Unfortunately, most recent research on pattern memoriza-
tion in feedforward networks focuses on the one output case.
In this paper, partially building upon the work of Davis [5],
we investigate an upper bound for M ≥ 1 and arbitrary
hidden unit activation functions. In section II, we introduce
our notation. A straightforward proof of the upper bound is
given in section III. An example which indicates the validity
of the bound is presented in section IV. In section V, we use
the upper bound to predict the size of MLPs that can mimic
the training behavior of SVMs.
II. NOTATION AND PRELIMINARIES
A. Notation
Let {x
p
, t
p
}
P
p=1
be the data set where x
p
∈ R
N
is the
input vector and t
p
∈ R
M
is the desired output vector and
P is the number of patterns. Let us consider a feedforward
MLP, having N inputs, one hidden layer with h nonlinear
units and an output layer with M linear units. For the p
th
pattern, the j
th
hidden unit’s net function and activation are
respectively
net
pj
=
N+1
i=1
w
h
(j, i) · x
pi
1 ≤ p ≤ P, 1 ≤ j ≤ h (1)
O
pj
= f (net
pj
) (2)
Here, the activation f (net) is a nonlinear function of the net
function. The weight w
h
(j, i) connects the i
th
input to the j
th
hidden unit. Here the threshold of the j
th
node is represented
by w
h
(j, N +1) and is handled by fixing x
p,N+1
to one. The
k
th
output for the p
th
pattern is given by
y
pk
=
N+1
i=1
w
oi
(k,i) · x
pi
+
h
j=1
w
oh
(k,j ) · O
pj
(3)
where 1 ≤ k ≤ M . For the p
th
pattern, the N input values
are x
pi
,(1 ≤ i ≤ N ) and the M desired output values are
t
pk
(1 ≤ k ≤ M ). w
oi
are the weights connecting inputs to
outputs and w
oh
are the weights connecting hidden units to
outputs.
B. Review
A feedforward network is said to have memorized a dataset
if for every pattern, the network outputs are exactly equal
to the desired outputs. Storage capacity of a feedforward
network is the number (P ) of distinct input vectors that
can be mapped, exactly, to the corresponding desired output
vectors resulting in zero error.
1-4244-1380-X/07/$25.00 ©2007 IEEE
Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August 12-17, 2007