Parallel Computing 14 (1990) 261-275 261
North-Holland
The geometry of multi-layer
perceptron solutions
F.J. SMIEJA and H. MUHLENBEIN
Gesellschaft fiir Mathematik und Datenverarbeitung (GMD), Postfach 1240, 5205 Sankt Augustin 1, FRG
Abstract. We geometrically classify multi-layer perceptron (MLP) solutions in two ways: the hyperplane
partitioning interpretation and the hidden-unit representation of the pattern set. We show these classifications to
be invariant under orthogonal transformations and translations in the space of the hidden units. These solitots
[sic] can be enumerated for any given Boolean mapping problem. Using a geometrical argument we derive the
total number of solitots available to a minimal network for the parity problem. A lower bound is computed for
the scaling of the number of solitots with input vector dimension, when a fixed fraction of patterns is removed
from the full training set. The generalization probability is shown to decrease with the exponential of the
problem size for the parity problem. We suggest that this, and hidden layer scaling problems, are serious
drawbacks to scaling-up of MLPs to larger tasks.
Keywords. Multi-layer perceptron, scaling, hyperplanes, complexity, generalization probability, minimal solu-
tions.
1. Introduction
At first glance it may seem reasonable to define a solution to a mapping problem, produced
by an MLP, as a particular configuration of weights which produces the correct behaviour. A
slight change in any of the weights might, however, still produce the desired mappings from the
network. Thus we might relax the former requirement and define a solution by some "basin of
attraction" in multidimensional weight space. But there are various network symmetries which
allow one to exchange labels on weights, and push some weight magnitudes to infinity, and still
get the "same effective" solution. We therefore see a need to define more clearly what is meant
by a "network solution".
This is achieved using two distinct methods: the hyperplane partitioning technique, and the
hidden-unit representation technique. These two methods are shown to describe the same types
of solutions, and so support each other, and we use both methods to probe the solution space.
These more specific definitions of MLP solution we name "solitot".
We explain the definition of solitot in Section 2. In Section 3 we define a practical measure,
based on the hidden-unit representation technique, which may be used to distinguish solitots.
In Section 4 we count the number of solitots available to an MLP of sizes n - n - I in the
parity problem. We then proceed, in Section 5, to demonstrate the variety of solitots it is
possible to obtain, when all but one of the possible input patterns are required to be mapped.
Not all such solutions will be correct mappings for the complete input set. We would like to
observe how the number of possible solitots, and the number which map the full input set
correctly, scale with n, given that a fixed fraction of patterns f is not included in the training
set. To this end we define a generalization probability GP(f, n) in Section 6. We show in
Section 7 how this number scales with n, using the geometrical information about parity
problem solutions and the definition in Section 3 to count explicitly the number of solitots.
016%8191/90/$03.50 © 1990 - Elsevier Science Publishers B.V. (North-Holland)