Multiple optimal learning factors for feed-forward networks Sanjeev S. Malalur and Michael T. Manry Department of Electrical Engineering, University of Texas at Arlington, Arlington TX 76013 ABSTRACT A batch training algorithm for feed-forward networks is proposed which uses Newton’s method to estimate a vector of optimal learning factors, one for each hidden unit. Backpropagation, using this learning factor vector, is used to modify the hidden unit’s input weights. Linear equations are then solved for the network’s output weights. Elements of the new method’s Gauss-Newton Hessian matrix are shown to be weighted sums of elements from the total network’s Hessian. In several examples, the new method performs better than backpropagation and conjugate gradient, with similar numbers of required multiplies. The method performs as well as or better than Levenberg-Marquardt, with several orders of magnitude fewer multiplies due to the small size of its Hessian. Keywords: Neural networks, multilayer perceptron, output weight optimization, backpropagation, orthogonal least squares, multiple optimal learning factors, linear dependence, Newton’s method, Gauss-Newton Hessian 1. INTRODUCTION Feed-forward neural networks such as the multi-layer perceptron (MLP) are statistical tools widely used for regression and classification applications in the areas of parameter estimation 1,2 , document analysis and recognition 3 , finance and manufacturing 4 and data mining 5 . The MLP draws its computing power from a layered, parallel architecture and has several favorable properties such as universal approximation 6 and the ability to mimic Bayes discriminant 7 and maximum a-posteriori (MAP) estimates 8 . Existing learning algorithms include first order methods such as backpropagation 9 (BP) and conjugate gradient 10 and second order learning methods related to Newton’s method. Since Newton’s method for the MLP often has non-positive definite 11, 12 or even singular Hessians, Levenberg-Marquardt 13, 14 (LM) and other methods are used instead. In this paper Newton’s method is used to obtain a vector of optimal learning factors, one for each MLP hidden unit. Section 2 reviews MLP notation, a simple first order training method, and an expression for the optimal learning factor 15 (OLF). The multiple optimal learning factor (MOLF) method is introduced in section 3. Section 4 presents a discussion on the effect of linearly dependent inputs and hidden units on learning using the proposed MOLF algorithm. Results and conclusion are presented in sections 5 and 6. 2. REVIEW OF MULTI-LAYER PERCEPTRON In this section, MLP notation is introduced and a convergent first order training method is described. 2.1. MLP notation In the fully connected MLP of figure 1, input weights w(k,n) connect the n th input to the k th hidden unit. Output weights w oh (m,k) connect the k th hidden unit’s activation o p (k) to the m th output y p (m), which has a linear activation. The bypass weight w oi (m,n) connects the n th input to the m th output. The training data, described by the set {x p , t p } consists of N- dimensional input vectors x p and M-dimensional desired output vectors, t p . The pattern number p varies from 1 to N v where N v denotes the number of training vectors present in the data set. In order to handle thresholds in the hidden and output layers, the input vectors are augmented by an extra element x p (N+1) where, x p (N+1) = 1 , so x p = [x p (1), x p (2),…., x p (N+1)] T . Let N h denote the number of hidden units. The vector of hidden layer net functions, n p and the actual output of the network, y p can be written as sanjeev.malalur@gmail.com; manry@uta.edu Please verify that (1) all pages are present, (2) all figures are acceptable, (3) all fonts and special characters are correct, and (4) all text and figures fit within the margin lines shown on this review document. Return to your MySPIE ToDo list and approve or disapprove this submission. 7703 - 15 V. 2 (p.1 of 12) / Color: No / Format: Letter / Date: 2010-01-25 03:01:33 PM SPIE USE: ____ DB Check, ____ Prod Check, Notes: