Neural Network Ensemble and Support Vector Machine Classifiers for the Analysis of Remotely Sensed Data: a Comparison G. Pasquariello, N. Ancona, P. Blonda, C. Tarantino, G. Satalino, A. D'Addabbo CNR - I.E.S.I., Via Amendola 166/5, 70126 Bari (Italy) Abstract - This paper presents a comparative evaluation between a classification strategy based on the combination of the outputs of a neural (NN) ensemble and the application of Support Vector Machine (SVM) classifiers in the analysis of remotely sensed data. Two sets of experiments have been carried out on a benchmark data set. The first set concerns the application of linear and non linear techniques to the combination of the outputs of a Multilayer Perceptron (MLP) neural network ensemble. In particular, the Bayesian and the error correlation matrix approaches are used for coefficient selection in the linear combination of the network’s outputs. A MLP module is used for the non linear outputs combination. The results of linear and non linear combination schemes are compared and discussed versus the performance of SVM classifiers. The comparative analysis evidences that the non linear, MLP based, combination provides the best results among the different combination schemes. On the other hand, better performance can be obtained with SVM classifiers. However, the complexity of the SVM training procedure can be considered a limitation for SVMs application to real-world problems. I. INTRODUCTION To improve the generalisation performance of a classification system, on one hand recent literature suggests to combine the decisions of several classifier rather than using the output of the best classifier in the ensemble [1], [2]. On the other, it proposes a new scheme called Support Vector Machine (SVM) [3]. The latter technique has been already used in different application domains, such as object detection and text categorization, and has outperformed the traditional NN technique in terms of generalisation capability. In this work two sets of experiments have been carried out to compare the performance of different classification strategies. The first set is based on the application of different schemes to combine the decisions from an ensemble of MLP neural classifiers. Both linear and non linear methods have been used. In particular, the correlation matrix of the output errors and a Bayesian assumption have been used for the coefficient selection in the linear combination, whereas a MLP module has been considered in the non linear combination. The second set of experiments concerns the application of SVM classifiers to the same data set, searching for a more robust generalisation technique. II. BACKGROUND Let x k be an input pattern vector (k ∈ [1,N]), t k the relative target vector of a L-class problem and Γ μ an ensemble of R classifiers (μ ∈ [1, R]). When the outputs of the classifiers in an ensemble are supplied as posterior probabilities, both linear and non-linear combination schemes can be used. 1)Linear Combination. For each input training pattern k, let y k be the linear combination of the outputs y k μ provided by each μ = 1,…,R classifier where R is the number of classifiers: y k = k R y μ μ μ α ∑ =1 (1) where α μ is the coefficient for the μ t h classifier satisfying the condition: ∑ = R 1 μ μ α = 1 (2) The problem to be solved is to find the "optimal" coefficients for the combination that minimises the error of the attributions. As evidenced in [4] the mean square error in combination is less equal than the average error of the R classifiers individually considered, assuring only that a combination system has a higher generalisation capability than the average of the classifier ensemble but nothing can be affirm respect to the best of the single classifiers. The simplest choice for the coefficients is R / 1 = μ α ∀μ (average rule). In this study, two different approaches have been used to optimize the coefficient selection. The two approaches are briefly reviewed in the following. For notational convenience, all the considerations are made for the recognition of a single class with respect to all the others, but the same considerations can be extended to the multiple-class problem. • Bayesian Approach. Following the Bayesian approach, described in [4] for regression problem, the linear combination estimates a posterior probability by weighting each single classifier with : ( ) ( ) ∑ ∑ ∑ = = =         - -         - - = R N k k k N k k k t y t y 1 1 2 2 1 1 2 2 1 exp exp ν ν μ μ α (3) • Error Correlation Matrix approach. It finds the coefficients of the linear combination by applying an 0-7803-7536-X/$17.00 (C) 2002 IEEE 509