Parallel Computing 14 (1990) 261-275 261 North-Holland The geometry of multi-layer perceptron solutions F.J. SMIEJA and H. MUHLENBEIN Gesellschaft fiir Mathematik und Datenverarbeitung (GMD), Postfach 1240, 5205 Sankt Augustin 1, FRG Abstract. We geometrically classify multi-layer perceptron (MLP) solutions in two ways: the hyperplane partitioning interpretation and the hidden-unit representation of the pattern set. We show these classifications to be invariant under orthogonal transformations and translations in the space of the hidden units. These solitots [sic] can be enumerated for any given Boolean mapping problem. Using a geometrical argument we derive the total number of solitots available to a minimal network for the parity problem. A lower bound is computed for the scaling of the number of solitots with input vector dimension, when a fixed fraction of patterns is removed from the full training set. The generalization probability is shown to decrease with the exponential of the problem size for the parity problem. We suggest that this, and hidden layer scaling problems, are serious drawbacks to scaling-up of MLPs to larger tasks. Keywords. Multi-layer perceptron, scaling, hyperplanes, complexity, generalization probability, minimal solu- tions. 1. Introduction At first glance it may seem reasonable to define a solution to a mapping problem, produced by an MLP, as a particular configuration of weights which produces the correct behaviour. A slight change in any of the weights might, however, still produce the desired mappings from the network. Thus we might relax the former requirement and define a solution by some "basin of attraction" in multidimensional weight space. But there are various network symmetries which allow one to exchange labels on weights, and push some weight magnitudes to infinity, and still get the "same effective" solution. We therefore see a need to define more clearly what is meant by a "network solution". This is achieved using two distinct methods: the hyperplane partitioning technique, and the hidden-unit representation technique. These two methods are shown to describe the same types of solutions, and so support each other, and we use both methods to probe the solution space. These more specific definitions of MLP solution we name "solitot". We explain the definition of solitot in Section 2. In Section 3 we define a practical measure, based on the hidden-unit representation technique, which may be used to distinguish solitots. In Section 4 we count the number of solitots available to an MLP of sizes n - n - I in the parity problem. We then proceed, in Section 5, to demonstrate the variety of solitots it is possible to obtain, when all but one of the possible input patterns are required to be mapped. Not all such solutions will be correct mappings for the complete input set. We would like to observe how the number of possible solitots, and the number which map the full input set correctly, scale with n, given that a fixed fraction of patterns f is not included in the training set. To this end we define a generalization probability GP(f, n) in Section 6. We show in Section 7 how this number scales with n, using the geometrical information about parity problem solutions and the definition in Section 3 to count explicitly the number of solitots. 016%8191/90/$03.50 © 1990 - Elsevier Science Publishers B.V. (North-Holland)