Performance Comparison between Edited kNN and MQ-RBFN for Regression and Classification Tasks J. E. B. Maia, V. R. S. Laboreiro, F. E. Chaves, F. J. A. Maia, T. G. N. Silva, T. N. Ferreira Universidade Estadual do Cear´ a - UECE - Itaperi Estat´ ıstica e Ciˆ encia da Computac ¸˜ ao Email: jose.maia@uece.br, {victor.laboreiro,edvanchaves,felipe.ja.maia,thi.nepo,thiagonascimento.uece,}@gmail.com Abstract—Supervised learning techniques can be roughly grouped into lazy learning or eager learning. Lazy learning and eager learning have very different properties and are suitable for different applications. In this paper we evaluate properties of the two types of learning using a representative distance based algorithm for each class, namely, kNN (k-nearest neighbors) and RBFN (Radial Basis Function Network). In addition, an edition algorithm (SPAM - Supervised Partitioning Around Medoids) is used to reduce the labeled dataset. Our experiments for classification and regression tasks, using 12 public datasets show that prototype selection algorithms typically used with kNN are good alternatives for selection of centers of RBFN when to optimize the number of centers is not the relevant criterion. The experiments also show that the RBFN generally perform better than Edited kNN. I. I NTRODUCTION The supervised learning techniques (classification or re- gression) can be roughly grouped into lazy learning or eager learning. In lazy learning [2] little or no effort is expended during the training phase because the effort is postponed until the generalization phase to a new instance. Thus, no predictive model is constructed in advance such as a neural network or a decision tree. In eager learning [2], in turn, focuses on the effort to build a concise and finished predictive model using training cases. Such a model should, to perform well, cover the entire input space focusing to achieve greater accuracy in regions of higher probability. Typical examples of lazy learning and eager learning are, respectively, the kNN [6] and RBFN [9] algorithms. When the target function is very complex but can be approximated using a combination of several local functions, lazy learners such as k-nearest neighbor (kNN) are known to achieve good results. However, as a downside, lazy learners have high requirements in terms of storage and processing time which may limit their use in real life applications. In order to avoid such problems and possibly improving the algorithm results by avoiding noise and overfitting, an viable approach is to edit the training set. In this paper, we call a kNN algorithm with the training set reduced by some method of redundancy and outliers elimination as Editing kNN. Radial Basis Function Networks (RBF Networks), on the other hand, have lower classification time requirements be- cause they produce global optimization of the target function during the training stage. Even though the RBFN is an eager learner method, it approximates the target function using a combination of several local functions. Therefore, RBF Net- works combine the advantages of lazy and eager learners. Studying this structural property of RBF Networks is the main motivation for this work. RBF Networks and kNN are Distance-based algorithms. Distance-based algorithms are machine learning algorithms that store internal parameters and a number of exemplars. They compute the output for a given instance exclusively from a combination of the internal parameters and the distance between that instance and each exemplar. To accomplish this task, RBFN uses knowledge of the global structure of the task to build an interpolation while kNN works exclusively with local information. However, it has not been conclusively shown that any global location algorithm consistently outperforms every other algorithm in any specific task [5]. In this study, we investigated the properties and perfor- mance of both supervised learning algorithms (Edited kNN and RBFN) on classification and regression tasks. The goal, therefore, is to gain insight on the comparative properties of lazy learning and eager learning. The remaining sections are organized as follows. The algorithms used in this work are briefly described in Section II. Section III presents the results together with the conceptual discussion of the results. Finally, Section IV concludes this paper. II. THE ALGORITHMS The methodology of this research is as follows. Each dataset is split into training set and test set. An algorithm (SPAM) is applied to edit instances of the training set. Reduced training sets with 20%, 40%, 60% and 80% of the initial size have been generated. Along with the initial dataset they form five different training sets. The performance of the algorithm kNN for k = 1, 3 and 5, based on each edited set is compared against the performance of RBFN. The RBFN is trained with the initial training set with the RBF centers having the same edited sets used by kNN. The used RBF activation function is the multiquadric. The following subsections describe briefly the algorithms mentioned. A. kNN - k Nearest Neighbors The kNN algorithm is one of the most widely used lazy learning approaches [2] that uses a non-parametric method for classifying patterns. This popularity is primarily due to its simplicity and intuitivity. It is a powerful classification algorithm capable of solving complex problems. Given a set of patterns X to be estimated, and x i X, the k-Nearest