Investigation of Diversity Strategies in SVM Ensemble Learning Lean Yu, Shouyang Wang Institute of Systems Science, Academy of Mathematics and Systems Sciences, Chinese Academy of Sciences Beijing, 100190, China {yulean, sywang}@amss.ac.cn Kin Keung Lai Department of Management Sciences City University of Hong Kong Hong Kong, China mskklai@cityu.edu.hk Abstract—In SVM ensemble learning, diversity strategy is one of the most important determinants to obtain good performance. In order to examine and analyze the impacts of diversity strategies on SVM ensemble learning, this study tries to make such a deep investigation by taking credit scoring as an illustrative example. Experimental results found that the accuracy of ensemble models will be increased if ensemble members are carefully selected for diversity maximization. Keywords—SVM, ensemble learning, diversity strategy, group decision making, credit scoring I. INTRODUCTION In machine learning, ensemble learning is an active research field. Among them, SVM ensemble learning has been proved to be an efficient learning paradigm for performance improvement [1]. Generally, the ensemble learning can be divided into two categories: competitive ensemble learning, where ensemble members work asynchronously on the same problem and the decision of best member is the final ensemble decision, and cooperative ensemble learning, where the final decision is a fusion or aggregation of the individual decisions of all ensemble members. However, an effective ensemble learning system may not be an individual model but the combination of several of them from a decision support system perspective, according to Olmeda and Fernandez [2]. Usually, ensemble learning model outperforms the individual learning models, whose performance is limited by the imperfection of feature extraction, learning algorithms, and the inadequacy of training data. Another reason supporting this argument is that different individual learning models have their inherent drawbacks and thus aggregating them may lead to a good learning system with a high generalization capability [3-4]. From the above descriptions, we can conclude that different or diverse ensemble members (i.e., different diversity strategies) are necessary for ensemble learning if one would like to obtain a learning system with good generalization. Furthermore, past literature review also found that a key determinant to the performance improvement is the diversity strategies of ensemble members in ensemble learning process. If each member in the ensemble system thinks alike, then they will come to the same conclusion and thus there may be no improvement in ensemble learning. In order to investigate the impact of diversity on final generalization in ensemble learning system, three typical diversity strategies for SVM-based ensemble learning system, data diversity, parameter diversity, and kernel diversity, are investigated thoroughly. The main contribution of this study is to investigate the effect of diversity strategy on the generalization error in the SVM-based ensemble learning system. For illustration purpose, we focus on a credit scoring problem because this decision environment is characterized by large volume (there are hundreds and thousands of credit loan applicants to evaluate), high significance (there are millions of dollars at stake), and repetitive activity (requires constant updating and monitoring). An improvement in decision accuracy of even a fraction of a percent transforms into significant savings for the industry. The rest of the study is organized as follows. The next section presents the diversity source investigated in this paper. For illustration and verification purposes, a practical credit scoring example is conducted and the corresponding computational results are reported in Section 3. And Section 4 concludes the study. II. SOURCES OF DIVERSITY IN SVM ENSEMBLE LEARNING As previously mentioned, diverse SVM models [5] should be used to capture the implicit patterns hidden in the dataset from different perspectives. Generally, an effective ensemble learning system consisting of diverse models with much disagreement is more likely to have a good generalization performance in terms of the principle of bias-variance trade-off [6]. Therefore, how to generate the diverse model is a crucial factor for constructing an effective ensemble learning system. For SVM models, several diversity strategies have been investigated for the generation of ensemble members making different errors. Such diversity strategies basically rely on varying the training samples and parameters related to the SVM design. In particular, some main diversity strategies include the following three aspects: data diversity, kernel diversity and parameter diversity. A. Data Diversity Data diversity is a class of diversity in learning content caused by the perturbation of the training data. For unstable models, the training data perturbations often result in models with different parameter estimates. Because different data often contain different information, these different data can generate Fourth International Conference on Natural Computation 978-0-7695-3304-9/08 $25.00 © 2008 IEEE DOI 39