This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1
Neural Networks Multiobjective Learning With
Spherical Representation of Weights
Honovan P. Rocha , Marcelo A. Costa, and Antônio P. Braga, Member, IEEE
Abstract— This article presents a novel representation of
artificial neural networks (ANNs) that is based on a projection
of weights into a new spherical space defined by a radius r and a
vector of angles . This spherical representation of ANNs further
simplifies the multiobjective learning problem, which is usually
treated as a constrained optimization problem that requires
great computational effort to maintain the constraints. With the
proposed spherical representation, the constrained optimization
problem becomes unconstrained, which simplifies the formulation
and computational effort required. In addition, it also allows the
use of any nonlinear optimization method for the multiobjective
learning of ANNs. Results presented in this article show that the
proposed spherical representation of weights yields more accurate
estimates of the Pareto set than the classical multiobjective
approach. Regarding the final solution selected from the Pareto
set, our approach was effective and outperformed some state-of-
the-art methods on several data sets.
Index Terms— Levenberg–Marquardt (LM) algorithm,
multilayer perceptron (MLP), multiobjective learning, spherical
representation.
I. I NTRODUCTION
I
N THE literature, learning from data has been often treated
as a problem involving a tradeoff between empirical and
structural risks [1]. Joint minimization of these risks is not
usually possible, since they have conflicting behaviors and
noncoincident minima. From the optimization perspective,
the problem is multiobjective in nature and the two objective
functions chosen to represent the empirical and structural
risks should be optimized in the tradeoff region [2]. The
example in Fig. 1 illustrates this principle. The graphs show
two hypothetical convex objective functions, φ
e
(·) and φ
c
(·),
which represent the empirical and structural risks, respectively,
as a function of the free parameter w. Functions φ
e
(·) and
φ
c
(·) have their minima in w
0
and w
1
. Since w
0
= w
1
,
their joint behavior can be observed in regions A–C in Fig. 1.
Manuscript received October 19, 2018; revised July 2, 2019 and Novem-
ber 20, 2019; accepted November 28, 2019. This work was supported in part
by National Research Council (CNPq) under Grant 158265/2014-9, in part by
Coordination for higher Education Staff Development (CAPES), and in part
by the Research Support Foundation of the State of Minas Gerais (FAPEMIG).
(Corresponding author: Honovan P. Rocha.)
H. P. Rocha is with the Institute of Engineering, Science and Technology,
Federal University of Vales do Jequitinhonha e Mucuri (UFVJM), Janaúba
39440-000, Brazil (e-mail: honovan.rocha@ufvjm.edu.br).
M. A. Costa is with the Department of Production Engineering, Federal
University of Minas Gerais (UFMG), Belo Horizonte 31270-901, Brazil
(e-mail: macosta@ufmg.br).
A. P. Braga is with the Department of Electronic Engineering, Federal
University of Minas Gerais (UFMG), Belo Horizonte 31270-901, Brazil
(e-mail: apbraga@ufmg.br).
Color versions of one or more of the figures in this article are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TNNLS.2019.2957730
Fig. 1. Example of two convex objective functions to be minimized. Region
B is the target region, where a tradeoff between the two objective functions
should be traded off.
In regions A and C, both functions can be jointly minimized,
but in region B, the two functions exhibit conflicting behaviors.
The solutions in region B are nondominated, they represent a
Pareto front and are in fact optimal from the optimization point
of view [3]. Any value of w chosen in this region will result
in a tradeoff between φ
e
(·) and φ
c
(·).
Using this multiobjective approach, optimal, nondominated
solutions to the Pareto front are generated, one of which can
be chosen based on a tradeoff criterion. In recent decades,
many approaches have been reported in the literature to
solve this problem without the formalism of multiobjective
optimization, including pruning and constructive methods,
regularization, smoothing functions, boosting, and bagging [4].
Support vector machine (SVM) learning [5] is formalized
as a biobjective problem, which involves maximizing the
margin among those solutions that yield the minimum error.
Margin maximization is achieved by minimizing the norm
of the weight vector [1], as shown schematically in Fig. 2,
so the problem has a biobjective formulation. The tradeoff
learning problem of SVMs, however, is solved in the feature
space after linearization, assuming a preestablished kernel
function and a regularization parameter C that are usually
provided by the user or estimated by cross validation. The
SVM became attractive since its learning function has a
minimum due to convexification; however, it still depends on
the given kernel and regularization parameters. In artificial
neural network (ANN) learning, feature mapping is not usually
treated separately, and the tradeoff problem is solved in the
input space, which requires a different treatment.
The most typical representation for φ
e
(·) in ANN learning is
the squared-error loss function
e
2
=
N
i =1
( y
i
− g(x
i
, w))
2
,
whereas model capacity φ
c
(·) is usually represented by the
2162-237X © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.