Parallel Neural Network Learning Through Repetitive Bounded Depth Trajectory Branching zyx Iuri Mehr Zoran ObradoviC* School of Electrical Engineering and Computer Science Washington State University Pullman, WA 99164-2752 Abstract zyxwvuts The neural network learning process is a sequence of network updates and can be represented by sequence of points in the weight space that we call a learning trajectory. In this paper a new learning approach based on repetitive bounded depth trajectory branching zyxwvu M proposed. This approach has objectives of improving generalization and speeding up Convergence by avoid- ing local minima when selecting an alternative trajec- tory. The ezperimental results show an improved gen- eralization compared to the standard back-propagation learning algorithm. The proposed parallel implemen- tation dramatically improves the algorithm eficiency to the level that computing time is not a critical factor in achieving improved generalization. 1 Introduction With most currently known algorithms, neural net- work learning has to be done sequentially resulting with a large amount of computational time and non- optimal generalization. For example, the well known back-propagation algorithm requires a huge amount of nonlinear computations and scales up poorly zyxwvut as tasks become larger and more complex [l]. By increasing the number of hidden units and layers we also increase the probability of experiencing the so called local min- ima problem where the error function is not minimized by further learning (although there exists a lower min- imum) [6]. We believe that more efficient and more accurate learning is possible through parallelization. The state of a neural network can be represented as a single point in the weight space. Each update of the neural network generates a new point in this *Z. Obradovit’s research is sponsored in part by the NSF research grant NSF-IRI-9308523. He is zyxwvutsr also &ated with the Mathematical Institute, Belgrade, Yugoslavia. space. Consequently the learning process can be r e p resented by a sequence of points in the weight space that we call a learning trajectory. We will refer to the standard trajectory as the trajectory of the weight vec- tor during the learning phase of a standard algorithm which can be any gradient-descent learning technique (back-propagation in our experiments it is). The ob- jective of this paper is both more efficient and more accurate neural network learning by exploring a num- ber of learning trajectories in parallel in order to find one that avoids local minima. Trajectories have the same starting point and the best one is selected after a bounded depth branching in weight space during the learning phase. At the first branching point, which could be after a single pattern presentation or a whole epoch, a number of new neural network structures are constructed with the same architecture as the original one but with the weight vector branching in various di- rections. At each following branching point the num- ber of neural networks is increased. When the number of trajectories reaches the maximum supported on the hardware a cross-validation test is performed on all generated networks (trajectories) and a small num- ber of trajectories are kept for further evolution again using bounded depth branching. The proposed algo- rithm allows a faster minimization of the error func- tion (a smaller number of epochs for convergence) as a result of frequent comparison among several learning trajectories. 2 Branching Algorithms In real life problems a neural network’s error func- tion usually generates a complex surface. Conse- quently a better trajectory could be found in the vicin- ity of the standard one using small variations in the branching parameters. In the proposed algorithm new branching points are generated systematically during the weight updating phase and one new trajectory is 0-8186-5602-6/94 zyxwvutsrqp 0 1994 IEEE 784