Investigation of Diversity Strategies in SVM
Ensemble Learning
Lean Yu, Shouyang Wang
Institute of Systems Science, Academy of Mathematics
and Systems Sciences, Chinese Academy of Sciences
Beijing, 100190, China
{yulean, sywang}@amss.ac.cn
Kin Keung Lai
Department of Management Sciences
City University of Hong Kong
Hong Kong, China
mskklai@cityu.edu.hk
Abstract—In SVM ensemble learning, diversity strategy is one
of the most important determinants to obtain good performance.
In order to examine and analyze the impacts of diversity
strategies on SVM ensemble learning, this study tries to make
such a deep investigation by taking credit scoring as an
illustrative example. Experimental results found that the
accuracy of ensemble models will be increased if ensemble
members are carefully selected for diversity maximization.
Keywords—SVM, ensemble learning, diversity strategy, group
decision making, credit scoring
I. INTRODUCTION
In machine learning, ensemble learning is an active
research field. Among them, SVM ensemble learning has been
proved to be an efficient learning paradigm for performance
improvement [1]. Generally, the ensemble learning can be
divided into two categories: competitive ensemble learning,
where ensemble members work asynchronously on the same
problem and the decision of best member is the final ensemble
decision, and cooperative ensemble learning, where the final
decision is a fusion or aggregation of the individual decisions
of all ensemble members. However, an effective ensemble
learning system may not be an individual model but the
combination of several of them from a decision support system
perspective, according to Olmeda and Fernandez [2]. Usually,
ensemble learning model outperforms the individual learning
models, whose performance is limited by the imperfection of
feature extraction, learning algorithms, and the inadequacy of
training data. Another reason supporting this argument is that
different individual learning models have their inherent
drawbacks and thus aggregating them may lead to a good
learning system with a high generalization capability [3-4].
From the above descriptions, we can conclude that different
or diverse ensemble members (i.e., different diversity strategies)
are necessary for ensemble learning if one would like to obtain
a learning system with good generalization. Furthermore, past
literature review also found that a key determinant to the
performance improvement is the diversity strategies of
ensemble members in ensemble learning process. If each
member in the ensemble system thinks alike, then they will
come to the same conclusion and thus there may be no
improvement in ensemble learning. In order to investigate the
impact of diversity on final generalization in ensemble learning
system, three typical diversity strategies for SVM-based
ensemble learning system, data diversity, parameter diversity,
and kernel diversity, are investigated thoroughly.
The main contribution of this study is to investigate the
effect of diversity strategy on the generalization error in the
SVM-based ensemble learning system. For illustration purpose,
we focus on a credit scoring problem because this decision
environment is characterized by large volume (there are
hundreds and thousands of credit loan applicants to evaluate),
high significance (there are millions of dollars at stake), and
repetitive activity (requires constant updating and monitoring).
An improvement in decision accuracy of even a fraction of a
percent transforms into significant savings for the industry.
The rest of the study is organized as follows. The next
section presents the diversity source investigated in this paper.
For illustration and verification purposes, a practical credit
scoring example is conducted and the corresponding
computational results are reported in Section 3. And Section 4
concludes the study.
II. SOURCES OF DIVERSITY IN SVM ENSEMBLE LEARNING
As previously mentioned, diverse SVM models [5] should
be used to capture the implicit patterns hidden in the dataset
from different perspectives. Generally, an effective ensemble
learning system consisting of diverse models with much
disagreement is more likely to have a good generalization
performance in terms of the principle of bias-variance trade-off
[6]. Therefore, how to generate the diverse model is a crucial
factor for constructing an effective ensemble learning system.
For SVM models, several diversity strategies have been
investigated for the generation of ensemble members making
different errors. Such diversity strategies basically rely on
varying the training samples and parameters related to the
SVM design. In particular, some main diversity strategies
include the following three aspects: data diversity, kernel
diversity and parameter diversity.
A. Data Diversity
Data diversity is a class of diversity in learning content
caused by the perturbation of the training data. For unstable
models, the training data perturbations often result in models
with different parameter estimates. Because different data often
contain different information, these different data can generate
Fourth International Conference on Natural Computation
978-0-7695-3304-9/08 $25.00 © 2008 IEEE
DOI
39