This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1 Neural Networks Multiobjective Learning With Spherical Representation of Weights Honovan P. Rocha , Marcelo A. Costa, and Antônio P. Braga, Member, IEEE Abstract— This article presents a novel representation of artificial neural networks (ANNs) that is based on a projection of weights into a new spherical space defined by a radius r and a vector of angles . This spherical representation of ANNs further simplifies the multiobjective learning problem, which is usually treated as a constrained optimization problem that requires great computational effort to maintain the constraints. With the proposed spherical representation, the constrained optimization problem becomes unconstrained, which simplifies the formulation and computational effort required. In addition, it also allows the use of any nonlinear optimization method for the multiobjective learning of ANNs. Results presented in this article show that the proposed spherical representation of weights yields more accurate estimates of the Pareto set than the classical multiobjective approach. Regarding the final solution selected from the Pareto set, our approach was effective and outperformed some state-of- the-art methods on several data sets. Index Terms— Levenberg–Marquardt (LM) algorithm, multilayer perceptron (MLP), multiobjective learning, spherical representation. I. I NTRODUCTION I N THE literature, learning from data has been often treated as a problem involving a tradeoff between empirical and structural risks [1]. Joint minimization of these risks is not usually possible, since they have conflicting behaviors and noncoincident minima. From the optimization perspective, the problem is multiobjective in nature and the two objective functions chosen to represent the empirical and structural risks should be optimized in the tradeoff region [2]. The example in Fig. 1 illustrates this principle. The graphs show two hypothetical convex objective functions, φ e (·) and φ c (·), which represent the empirical and structural risks, respectively, as a function of the free parameter w. Functions φ e (·) and φ c (·) have their minima in w 0 and w 1 . Since w 0 = w 1 , their joint behavior can be observed in regions A–C in Fig. 1. Manuscript received October 19, 2018; revised July 2, 2019 and Novem- ber 20, 2019; accepted November 28, 2019. This work was supported in part by National Research Council (CNPq) under Grant 158265/2014-9, in part by Coordination for higher Education Staff Development (CAPES), and in part by the Research Support Foundation of the State of Minas Gerais (FAPEMIG). (Corresponding author: Honovan P. Rocha.) H. P. Rocha is with the Institute of Engineering, Science and Technology, Federal University of Vales do Jequitinhonha e Mucuri (UFVJM), Janaúba 39440-000, Brazil (e-mail: honovan.rocha@ufvjm.edu.br). M. A. Costa is with the Department of Production Engineering, Federal University of Minas Gerais (UFMG), Belo Horizonte 31270-901, Brazil (e-mail: macosta@ufmg.br). A. P. Braga is with the Department of Electronic Engineering, Federal University of Minas Gerais (UFMG), Belo Horizonte 31270-901, Brazil (e-mail: apbraga@ufmg.br). Color versions of one or more of the figures in this article are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2019.2957730 Fig. 1. Example of two convex objective functions to be minimized. Region B is the target region, where a tradeoff between the two objective functions should be traded off. In regions A and C, both functions can be jointly minimized, but in region B, the two functions exhibit conflicting behaviors. The solutions in region B are nondominated, they represent a Pareto front and are in fact optimal from the optimization point of view [3]. Any value of w chosen in this region will result in a tradeoff between φ e (·) and φ c (·). Using this multiobjective approach, optimal, nondominated solutions to the Pareto front are generated, one of which can be chosen based on a tradeoff criterion. In recent decades, many approaches have been reported in the literature to solve this problem without the formalism of multiobjective optimization, including pruning and constructive methods, regularization, smoothing functions, boosting, and bagging [4]. Support vector machine (SVM) learning [5] is formalized as a biobjective problem, which involves maximizing the margin among those solutions that yield the minimum error. Margin maximization is achieved by minimizing the norm of the weight vector [1], as shown schematically in Fig. 2, so the problem has a biobjective formulation. The tradeoff learning problem of SVMs, however, is solved in the feature space after linearization, assuming a preestablished kernel function and a regularization parameter C that are usually provided by the user or estimated by cross validation. The SVM became attractive since its learning function has a minimum due to convexification; however, it still depends on the given kernel and regularization parameters. In artificial neural network (ANN) learning, feature mapping is not usually treated separately, and the tradeoff problem is solved in the input space, which requires a different treatment. The most typical representation for φ e (·) in ANN learning is the squared-error loss function e 2 = N i =1 ( y i g(x i , w)) 2 , whereas model capacity φ c (·) is usually represented by the 2162-237X © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.