E. Suzuki and S. Arikawa (Eds.): DS 2004, LNAI 3245, pp. 183–194, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Enhancing SVM with Visualization
Thanh-Nghi Do and François Poulet
ESIEA Recherche
38, rue des Docteurs Calmette et Guérin
Parc Universitaire de Laval-Changé
53000 Laval - France
{dothanh,poulet}@esiea-ouest.fr
Abstract. Understanding the result produced by a data-mining algorithm is as
important as the accuracy. Unfortunately, support vector machine (SVM) algo-
rithms provide only the support vectors used as black box to efficiently clas-
sify the data with a good accuracy. This paper presents a cooperative approach
using SVM algorithms and visualization methods to gain insight into a model
construction task with SVM algorithms. We show how the user can interac-
tively use cooperative tools to support the construction of SVM models and in-
terpret them. A pre-processing step is also used for dealing with large datasets.
The experimental results on Delve, Statlog, UCI and bio-medical datasets show
that our cooperative tool is comparable to the automatic LibSVM algorithm,
but the user has a better understanding of the obtained model.
1 Introduction
The SVM algorithms proposed by Vapnik [22] are a well-known class of data mining
algorithms using the idea of kernel substitution. SVM and kernel related methods
have shown to build accurate models but the support vectors found by the algorithms
provide limited information. Most of the time, the user only obtains information re-
garding the support vectors and the accuracy. It is impossible to explain or even un-
derstand why a model constructed by SVM performs a better prediction than many
other algorithms. Understanding the model obtained by the algorithm is as important
as the accuracy. A good comprehension of the knowledge discovered can help the
user to reduce the risk of wrong decisions. Very few papers have been published
about methods trying to explain SVM results ([3], [20]). Our investigation aims at
using visualization methods to try to involve more intensively the user in the con-
struction of the SVM model and to try to explain their results. A new cooperative
method based on a set of different visualization techniques and large scale Mangasar-
ian SVM algorithms [10], [16] gives an insight into the classification task with SVM.
We will illustrate how to combine some strength of different visualization methods
with automatic SVM algorithms to help the user and improve the comprehensibility
of SVM models. The experimental performance of this approach is evaluated on
Delve [8], Statlog [18], UCI [2] and bio-medical [13] data sets. The results show that
our cooperative method is comparable with LibSVM (a high performance automatic
SVM algorithm [4]). We also use a pre-processing step to deal with very large data-