arXiv:1605.06276v1 [cs.LG] 20 May 2016 Piece-wise quadratic lego set for constructing arbitrary error potentials and their fast optimization A.N. Gorban a,∗ , E.M. Mirkes a , A. Zinovyev b a Department of Mathematics, University of Leicester, Leicester, LE1 7RH, UK b Institut Curie, PSL Research University, Mines Paris Tech, Inserm, U900, F-75005, Paris, France. Abstract Most of machine learning approaches have stemmed from the application of minimizing the mean squared distance principle, based on the computationally efficient quadratic optimization methods. However, when faced with high-dimensional and noisy data, the quadratic error functionals demonstrate many weaknesses including high sensitivity to contaminating factors and dimensionality curse. Therefore, a lot of recent applications in machine learning exploited the properties of non-quadratic error functionals based on L1 norm or even sub-linear potentials corresponding to fractional norms. The back side of these approaches is tremendous increase in computational cost for optimization. Till so far, no approaches have been sug- gested to deal with arbitrary error functionals, in a flexible and computationally efficient framework. In this paper, we develop the theory and basic universal data approximation algorithms (k-means, principal components, principal manifolds and graphs), based on piece-wise quadratic error potentials of subquadratic growth (PQSQ potentials). We develop a new and universal framework to minimize arbitrary sub-quadratic error potentials using an algorithm with guaranteed fast convergence to the local or global error minimum. The approach can be applied in most of existing machine learning methods, including methods of data approximation and regularized regression, leading to the improvement in the computational cost/accuracy trade-off. Keywords: data approximation, nonquadraic potential, principal components, clustering 1. Introduction Modern machine learning and artificial intelli- gence methods are revolutionizing many fields of science today, such as medicine, biology, engineer- ing, high-energy physics and sociology, where large amounts of data have been collected due to the emergence of new high-throughput computerized technologies. Historically and methodologically speaking, many machine learning algorithms have been based on minimizing the mean squared er- ror potential, which can be explained by tractable properties of normal distribution and existence of computationally efficient methods for quadratic op- timization. However, most of the real-life datasets * Corresponding author Email addresses: ag153@le.ac.uk (A.N. Gorban), em322@le.ac.uk (E.M. Mirkes), Andrei.Zinovyev@curie.fr (A. Zinovyev) are characterized by strong noise, long-tailed dis- tributions, presence of contaminating factors, large dimensions. Using quadratic potentials can be drastically compromised by all these circumstances: therefore, a lot of practical and theoretical efforts have been made in order to exploit the proper- ties of non-quadratic error potentials which can be more appropriate in certain contexts. For example, methods of regularized regression such as lasso and elastic net based on the properties of L1 metrics [1, 2] found numerous applications in bioinformat- ics [3], and L1 norm-based methods of dimension reduction are of great use in automated image anal- ysis [4]. Not surprisingly, these approaches comes with drastically increased computational cost, for example, connected with applying linear program- ming optimization techniques which are substan- tially more expensive compared to mean squared error-based methods. In practical applications of machine learning, it Preprint submitted to Elsevier October 29, 2018