Learning by Conjugate Gradients Martin F. Moiler Computer Science Department, Mathematical Institute University of Aarhus, Denmark Abstract A learning algorithm (CG) with superlinearconvergence rate is introduced. The algorithm is based upon a class of optimization techniques well known in numerical analysis as the Conjugate Gradient Methods. CG uses second order information from the neural network but requires only O(N) memory usage, where N is the number of minimization variables; in our case all the weights in the network. The performance of CG is benchmarked against the performance of the ordinary backpropagation algorithm (BP). Wefind that CG is considerably faster than BP and that CG is able to perform the learning task withfewer hidden units. 1 Introduction 1.1 Motivation In the recent years neural networks have shown themselves to be good alternatives to conventional methods used in classification tasks. Adaptive learning algorithms have been subject to intense investigation and many different algorithms have been suggested. These algorithms are often developed in a kind of ad hoc fashion with the local properties of a neural network as a basis for development. They usually have a very poor convergence rate or depend on parameters which have to be specified by the user, because no theoretical basis for choosing them exists. The values of these parameters are often crucial for the success of the algorithm. The aim of this paper is to develop a learning algorithm that eliminates some of these disadvantages. In the development of the CG-algorithm we will abstract from the local properties of the neural network and look at the problem of learning in a more general way. From an optimization point of view learning in a neural network can be seen as being equivalent to minimizing a global error function, which is a multivariate function that depends on the weights in the network. This perspective gives some advantages in the development of effective learning algorithms because the problem of minimizing a function is well known in other fields of science, such as conventional numerical analysis.