Accelerating the Multilayer Perceptron Learning
With The Davidon Fletcher Powell Algorithm *
Sabeur Abid, Aymen Mouelhi; Farhat Fnaiech : Senior Member IEEE,
ESSTT, (CEREP) 5 Av. Taha Hussein, 1008, Tunis, Tunisia, Email: Farhat.Fnaiech@esstt.rnu.tn ; abid2f@yahoo.fr ;
aymen_mouelhi@yahoo.fr
Abstract:
In this paper, the Davidon Fletcher Powell (DFP) algorithm
for nonlinear least squares is proposed to train multilayer
perceptron (MLP). Applied on both a single output layer
perceptron and MLP, we find that this algorithm is faster
than the Marquardt-Levenberg (ML) algorithm known as
the fastest algorithm used to train MLP until now. The
number of iterations required by DFP algorithm to converge
is less than about 50% of what is required by the ML
algorithm. Interpretations of these results are provided in
the paper.
Keywords:
Multilayer Perceptron (MLP), Marquardt-Levenberg (ML)
algorithm, Davidon Fletcher Powell (DFP) algorithm.
1. INTRODUCTION
The traditional method for training a multilayer perceptron is
the standard backpropagation (SBP) algorithm. This one
suffers from the slowness of convergence; several iterations
are required to train a small network, even for a simple
problem. Much research works was pushed to accelerate the
convergence of the algorithm. This research falls roughly into
two learning approaches. The first one is based on the first
order optimization techniques such as varying the learning
rate of the gradient method used in SBP, using momentum
and rescaling variables [6]-[7]. The second approach is based
on the use of the second order optimization methods applied
to train the MLP. The most popular approaches from the
second category have used conjugate gradient or quasi-
Newton (Secant) methods. The quasi-Newton methods are
modified versions of Newton method known by its high
convergence speed. All these optimization techniques like
ML and DFP methods are based on the approximation of the
Hessian matrix used in the Newton method and they are
considered to be more efficient, but their storage and
computational requirements go up as the square of the size of
the network.
In [4], a quasi-Newton method called Broyden, Fletcher,
Goldfarb and Shanno (BFGS) method have been applied to
train the MLP and it’s found that this algorithm converges
much faster than the standard backpropagation algorithm, but
the potential drawback of the BFGS method is that it requires
a lot of memory to store the Hessian matrix. In [3], the
*
A part of this work has been published in 3
rd
International
Symposium on Image and Signal Processing and Analysis,
September 18-200, 2003, Rome, Italy
Marquardt-Levenberg algorithm was applied for training the
feedforward neural network and simulation results on some
problems showed that the algorithm is very faster than
conjugate gradient algorithm and variable learning algorithm.
The great drawback of this algorithm is its high
computational complexity and its sensitivity for the initial
choice of the parameter µ in the update of Hessian matrix.
This paper presents the application of a new algorithm based
on DFP method to train a MLP. The DFP algorithm consists
of approximating the Hessian matrix (as is the case of the
quasi-Newton methods) and it’s used in the first time to train
a single output layer and in second time to train a MLP. In the
previous work in [2] S. Abid et al have applied the DFP
algorithm to train only the output layer of the MLP, the
hidden layers are trained with the standard back propagation
algorithm. In this work we apply the DFP algorithm to update
synaptic weights of all layers. The new algorithm is compared
with the ML one. Note that there is no need to compare the
proposed algorithm with respect to the SBP one because this
later has a very slow rate of convergence since it’s based on
the first order optimization method, and all second order
optimization methods are obviously much faster.
Nevertheless a brief review of the SBP technique is important
to carry out the other algorithm equations.
This paper is organized as follows: In section II, we present
briefly the basic standard back-propagation (SBP) algorithm.
The Davidon Fletcher Powell algorithm is then presented in
section III. In section IV, we present the DFP algorithm for
training MLP. In section V, simulation results are given
showing comparison between the new algorithm and the ML
one, and finally section VI contains a summary and
conclusions.
f
f
f
f
f
f
f
f
f
1 s L
u
[1]
1
u
[1]
2
u
[1]
n1
u
[L]
1
u
[L ]
2
u
[ L]
n
u
[s ]
1
u
[s ]
2
u
[s]
n
y
[1]
1
y
[1]
2
y
[1]
n 1
y
[s ]
1
y
[s ]
2
y
[s]
n
y
[L]
1
y
[L]
2
y
[L]
n
y
1
y
2
y
n 1
0.5
0.5 0.5
W
[1]
W
[ s]
W
[ L]
s s L L
[0]
[0]
[0]
Fig.1: Fully connected feedforward multilayer perceptron
0-7803-9490-9/06/$20.00/©2006 IEEE
2006 International Joint Conference on Neural Networks
Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada
July 16-21, 2006
3389