Deep Support Vector Classification and Regression David D´ ıaz-Vico 1,2(B ) , Jes´ us Prada 1 , Adil Omari 3 , and Jos´ e R. Dorronsoro 1,2 1 Dpto. Ing. Inform´atica, Universidad Aut´onoma de Madrid, Madrid, Spain david.diazv@estudiante.uam.es 2 Instituto de Ingenier´ ıa del Conocimiento, Universidad Aut´onoma de Madrid, Madrid, Spain 3 Signal Theory and Communications Department, Universidad Carlos III, Madrid, Spain Abstract. Support Vector Machines, SVM, are one of the most popular machine learning models for supervised problems and have proved to achieve great performance in a wide broad of predicting tasks. However, they can suffer from scalability issues when working with large sample sizes, a common situation in the big data era. On the other hand, Deep Neural Networks (DNNs) can handle large datasets with greater ease and in this paper we propose Deep SVM models that combine the highly non-linear feature processing of DNNs with SVM loss functions. As we will show, these models can achieve performances similar to those of standard SVM while having a greater sample scalability. 1 Introduction Support Vector Machines (SVM; [17]) are one of the state of the art methods for supervised classification and regression and, as such, widely used. One key fact for this is their ability to work implicitly with kernels such as the Gaussian one, which map the initial features to a possibly infinite dimensional reproducing kernel Hilbert space. But, on the other hand, this capability also hinders their ability to cope with large datasets, as the handling of the kernel matrix becomes too costly or, sometimes, unfeasible and, even if a model is finally built, the so called “kernelization curse”, i.e., the fact that the number of Support Vectors grows linearly with sample size, implies that the model may be too costly in memory or time to exploit. Many proposals have appeared in the literature to overcome these problems, usually for Support Vector classification (SVC) problems. Among them we can mention the incremental learning of SVMs [1], ensemble learning of SVMs [6] or cutting planes [15] but, nevertheless, it can be said that, unless substantial hardware resources are committed, current kernel SVM training methods are not competitive for datasets with more than about 100,000 patterns. A. Omari—Currently at Telef´onica. c Springer Nature Switzerland AG 2019 J. M. Ferr´andez Vicente et al. (Eds.): IWINAC 2019, LNCS 11487, pp. 33–43, 2019. https://doi.org/10.1007/978-3-030-19651-6_4