14 th INTERNATIONAL SCIENTIFIC CONFERENCE ON PRODUCTION ENGINEERING –CIM2013 Croatian Association of Production Engineering, Zagreb 2013 PARALLEL LEVENBERG-MARQUARDT-BASED NEURAL NETWORK WITH VARIABLE DECAY RATE Tomislav Bacek, Dubravko Majetic, Danko Brezak Mag. ing. mech. T. Bacek, University of Zagreb, FSB, I. Lucica 5, 10000 Zagreb Prof. dr.sc. D. Majetic, University of Zagreb, FSB, I. Lucica 5, 10000 Zagreb Doc. dr.sc. D. Brezak, University of Zagreb, FSB, I. Lucica 5, 10000 Zagreb Keywords: neural networks, regression, parallel levenberg-marquardt algorithm Abstract In this paper, parallel Levenberg-Marquardt- based feed-forward neural network with variable weight decay, implemented on the Graphics Proce- ssing Unit, is suggested. Two levels of parallelism are implemented in the algorithm. One level of parallelism is achieved across the data set, due to inherently parallel structure of the feed-forward neural networks. Another level of parallelism is achieved in Jacobian computation. To avoid third level of parallelism, i.e. parallelization of optimi- zation search steps, and to keep the algorithm simple, variable decay rate is used. Parameters of variable decay rate rule allow for compromise between oscillations and higher accuracy on one side and stable but slower convergence on the other side. To improve training speed and efficiency modification of random weight initializa- tion is included. Testing of a parallel algorithm is performed on two real domain benchmark problems. Results, given in a form of a table with obtained speedups, show the effectiveness of proposed algorithm implementation. 1. INTRODUCTION Artificial neural networks (NN) are used in a wide variety of applications due to their capability to learn and to generalize. Due to their simple struc- ture and capability of nonlinear mapping of any input to any output, most widely used NNs are feed-forward NN. Many different learning algorithms for feed-for- ward NNs have been reported in the literature so far. Most widely used learning algorithm has long time been gradient descent, whose poor con- vergence rates were significantly improved, as shown in [1], by introducing different modifications, including momentum and adaptive learning coe- fficient, [2], [3], [4]. Nonetheless, methods of second order, such as Gauss-Newton method, result in much faster convergence rate since they take into account information on error surface as well, [5]. Best convergence rates gives Levenberg- Marquardt (LM) method since it is a combination of simple gradient descent and Gauss-Newton method, thus taking best of both methods – stable convergence of former and fast convergence of latter method. Although this pseudo-second order method has fast convergence rates on a small-scale problems, it proves to be very inefficient when it comes to a large-scale problems due to computational com- plexity, memory requirements and error osci- llations, [5], [6]. In order to tackle these problems, different approaches have been suggested in the literature. Works on reduction of memory demands and computational complexity can be found in [7], [8] and [9]. Another approach is suggested in [6], where variable decay rate was introduced in order to decrease error oscillations of standard LM algorithm. Due to the advances in computer architecture and inherently parallel nature of feed-forward NNs, parallelization of NNs was yet another approach suggested in the literature. If NN learns using batch mode, then it is possible to parallelize evaluation of objective function and its partial derivatives using simple data-parallel decomposition, [10]. Similar approach, but using MPI on .NET platform, was suggested in [11]. Suri et. al. [12] suggested parallel LM-based NNs using MPI, but in addition to simple data-parallelism, they also implemented computation of row block of the Jacobian in para- llel. Three levels of parallelization using clusters are suggested in [13], where authors implemented parallelization on data sets, parallelization of the Jacobian computation and parallelization of the search steps. Apart from using clusters to implement parallel NNs, another way is to use GPU (Graphics Proce- ssing Unit). In recent years, GPUs have rapidly evolved from configurable graphics processors to programmable, massivelly parallel many-core mul- tiprocessors used for many general applications (hence the name GPGPU – General Purpose GPU). To our knowledge, not many implementa- tions of parallel NNs on GPU have been proposed in the literature so far. Moreover, proposed implementations are based on one-level, data-pa- rallelism only, [14], [15]. In order to overcome drawbacks of LM algo- rithm and to use huge potential of GPUs, which are nowadays easily accessible, parallel GPU imple-