M5.26 A NEURAL NETWORK WITH BOOLEAN OUTPUT LAYER zyxwv PETER STROBACH SIEMENS AG, Zentralabteilung Forschung und Entwicklung ZFE IS INF 1, Otto-Hahn-Ring 6, D-8000 Munchen 83, FRG ABSTRACT zyxwvutsrqpon - The design of feed-forward ADALINE neural networks can be split into two independent optimization problems. (1) the design of the first hidden layer which uses linear hyperplanes to partition the continuous am- plitude input space into a number of cells and zyxwvuts (2) the design of the second and succeeding hidden layers which "group" the cells to larger decision regions. The weights of a linear combiner in the first hidden layer are best adjusted in a sense that a hyperplane determined by these weights is placed exactly in the middle of the "gap" between two training sets. This leads to a minimax opti- mization problem. The hyperplanes intersect in the input space and form a "lattice" of decision cells. The basic functioning of the first hidden layer is therefore a vector quantization of the input space. Each decision cell in the lattice is uniquely determined by its "codeword", namely, the binary output of the first hidden layer. The basic functioning of the second and succeeding hidden layers is then to perform a "grouping" of decision cells. The grouping of decision cells can be described alternatively by a Boolean function of the output "word of the first hidden layer. In this way it is shown that the second and succeeding hidden layers in a feed-forward network may be replaced by a simple logical network. An algorithm for the design of this logical network is devised. I. INTRODUCTION There has been a flurry interest recently in feed-forward neural networks for pattern recognition, classification and other purposes. This paper deals with networks of the ADALINE type [I, 21. The basic function of a feed-forward multilayer network of ADALINE neurons is that the first hidden layer uses linear hyperplanes to partition the ob- servation space into a number of decision cells. The sole function of the additional layers in the network is then to group these decision cells together in order to form larger decision regions. Many approaches have attempt to com- bine more than three layers in a multilayer structure from the believe that additional layers may continuously improve the capability of the network to approximate arbitrary decision regions. This paper shows that the task of for- ming arbitrary decision regions in an observation space can be split into two independent zyxwvutsrqpo subtasks: (1) The optimal design of the weights of the first hidden layer, which can be formulated as a minimax optimization problem. (2) The design of the additional layers as the realization of a Boolean function. These considerations lead to the insight that a feed-forward network with only two layers follow- ing the first hidden layer can completely determine any desired decision region constructable from a combination of decision cells. These two layers following the first hidden layer have an alternative realization as a logical network termed the Boolean output layer. This paper dis- cusses (1) the design of the first hidden layer and (2) the design of the Boolean output layer. It is shown that the logical network of the Boolean output layer can be a com- putationally attractive alternative to the conventional ADALINE realization of the second and third hidden layer. A. The ADALINE Neuron The basic element in the networks studied in this paper is the adaptive linear neuron (ADALINE) which computes the inner product y of a pattern vector zyxw x = [xi, x2, . . . , xqlT and a pre-determined weight vector w = [wl, w2, . . . , wqlT plus a fixed threshold which can be set to - 1 without loss of generality. The output d of the ADALINE "neuron" is then simply the sign of y (binary decision): y = -1 + xTw = -1 + xxiwi 9 ; (la) d = SGN(y) . (lb) i= 1 The elements of the weight vector may be interpreted geometrically as the parameters of the hyperplane y = 0 in the q-dimensional observation space of pattern vectors. This hyperplane divides the observation space into two open half-spaces. The thresholded inner product (la) of a pattern vector and the weight vector can be interpreted as a "distance" between the pattern vector and the hyperplane. Moreover, the sign of y determines whether a given pattern is an element of the "left" (-) or the "right" (+) half-space. -1 zyxwvu -/---Lrb - l/Wi ' xi Fig. 1. The ADALINE neuron and the interpretation of its coefficients as the parameters of a hyperplane in the q-dimensional observation space. Given two sets of training patterns X'k' and X"' , the weights w of an ADALINE neuron may be adjusted (trained) so that the decision hyperplane is placed in the gap between the two training sets. Given an arbitrary pattern vector x, the output of the ADALINE is a binary decision whether x is an element of class 1 determined by the half-space of the training set XCk' , or an element of class 2 determined by the half-space of X'?' 2081 CH2847-2/90/0000-2081 $1.00 zyxw 0 1990 IEEE