Neural Processing Letters 9: 53–61, 1999.
© 1999 Kluwer Academic Publishers. Printed in the Netherlands.
53
Initialization of Supervised Training for Parametric
Estimation
P. COSTA and P. LARZABAL
LESIR-ENS de Cachan, 61 av. Du Président Wilson, 94235 Cachan Cedex, France.
E-mail: pascale.costa@lesir.ens-cachan.fr
Abstract. This paper concerns the initialization problem of the training algorithm in Neural Net-
works. We focus herein on backpropagation networks with one hidden layer. The initialization of
the weights is crucial; if the network is incorrectly initialized, it converges to local minima. The
classical random initialization therefore appears as a very poor solution. If we were to consider the
Taylor development of the mapping problem and the nonlinearity of sigmoids, the improvements
could be very significant. We propose a new initialization scheme based on the search for an explicit
approximate solution to the problem of mapping between pattern and target. Simulation results are
presented which show that these original initializations avoid local minima, reduce training time,
obtain a better generalization and estimate the network’s size.
Key words: estimation, global convergence, initialization, multilayer perceptron
1. Introduction
The problem of learning in Neural Networks is easily formulated in terms of the
minimization of an error function. This error is a function of the adaptive parame-
ters (weights and biases) in the network. The problem of minimizing continuous,
differentiable functions of many variables is one which has been widely studied.
Many of the conventional approaches to this problem are directly applicable to
the training of Neural Networks. In order to apply an optimization algorithm to
real problems, we need to address a variety of practical issues, as reported in the
literature on backpropagation networks. The principal orientations of research in
Multilayer Networks have focused on improving the optimization procedure (adap-
tation of learning rate, second-order algorithm, modification of network size during
training, criteria used to terminate training, normalization of data, etc).
Less emphasis has been placed on the initialization of the network (for example:
[1], [2], [3], [4]...). Training algorithms usually begin by initializing the weights
in the network to some randomly chosen values. An appropriate choice of initial
weights is therefore potentially important in allowing the training algorithm to pro-
duce a good set of weights as well as in leading to improvements in training speed.
Even stochastic algorithms, such as gradient descent, which have the possibility of
avoiding local minima, can show strong sensitivity to the initial condition.