E. Suzuki and S. Arikawa (Eds.): DS 2004, LNAI 3245, pp. 183–194, 2004. © Springer-Verlag Berlin Heidelberg 2004 Enhancing SVM with Visualization Thanh-Nghi Do and François Poulet ESIEA Recherche 38, rue des Docteurs Calmette et Guérin Parc Universitaire de Laval-Changé 53000 Laval - France {dothanh,poulet}@esiea-ouest.fr Abstract. Understanding the result produced by a data-mining algorithm is as important as the accuracy. Unfortunately, support vector machine (SVM) algo- rithms provide only the support vectors used as black box to efficiently clas- sify the data with a good accuracy. This paper presents a cooperative approach using SVM algorithms and visualization methods to gain insight into a model construction task with SVM algorithms. We show how the user can interac- tively use cooperative tools to support the construction of SVM models and in- terpret them. A pre-processing step is also used for dealing with large datasets. The experimental results on Delve, Statlog, UCI and bio-medical datasets show that our cooperative tool is comparable to the automatic LibSVM algorithm, but the user has a better understanding of the obtained model. 1 Introduction The SVM algorithms proposed by Vapnik [22] are a well-known class of data mining algorithms using the idea of kernel substitution. SVM and kernel related methods have shown to build accurate models but the support vectors found by the algorithms provide limited information. Most of the time, the user only obtains information re- garding the support vectors and the accuracy. It is impossible to explain or even un- derstand why a model constructed by SVM performs a better prediction than many other algorithms. Understanding the model obtained by the algorithm is as important as the accuracy. A good comprehension of the knowledge discovered can help the user to reduce the risk of wrong decisions. Very few papers have been published about methods trying to explain SVM results ([3], [20]). Our investigation aims at using visualization methods to try to involve more intensively the user in the con- struction of the SVM model and to try to explain their results. A new cooperative method based on a set of different visualization techniques and large scale Mangasar- ian SVM algorithms [10], [16] gives an insight into the classification task with SVM. We will illustrate how to combine some strength of different visualization methods with automatic SVM algorithms to help the user and improve the comprehensibility of SVM models. The experimental performance of this approach is evaluated on Delve [8], Statlog [18], UCI [2] and bio-medical [13] data sets. The results show that our cooperative method is comparable with LibSVM (a high performance automatic SVM algorithm [4]). We also use a pre-processing step to deal with very large data-