Supervised Learning Using Local Analysis in an Optimal-Path Forest Willian Paraguassu Amorim FACOM - Institute of Computing Federal University of Mato Grosso do Sul - UFMS Campo Grande, Brazil Email: paraguassuec@gmail.com Marcelo Henriques de Carvalho FACOM - Institute of Computing Federal University of Mato Grosso do Sul - UFMS Campo Grande, Brazil Email: mhc@facom.ufms.br Abstract—In this paper, we present an OPF-LA (Optimal Path Forest–Local Analysis), a new learning model proposal. OPF-LA is a heuristic that uses local information for selecting prototypes that, in turn, will be used to classify new data. It employs the main ideas of an OPF classiﬁer, suggesting a new procedure in the data training phase. Experimental results show the advantages in efﬁciency and accuracy over classical learning algorithms in areas such as Support Vector Machines (SVM), Artiﬁcial Neural Networks using Multilayer Perceptrons (MP), and Optimal Path Forest (OPF), in several applications. Keywords-Supervised classiﬁers; Optimal-Path Forest; I. I NTRODUCTION Pattern recognition is a research area which aims to classify patterns into categories or classes. Given a set of c classes, ω 1 ,ω 2 , ··· ,ω c , and a pattern, x, a pattern recognition system associates the pattern x to the label i of one of the classes ω i . The pattern classiﬁcation problem is divided into the following classes: (i) supervised, where each input pattern is identiﬁed as a member of a predeﬁned class, (ii) unsupervised, where each input pattern is assigned to a class as yet unknown, and (iii) semi-supervised, where part of the input set has a predeﬁned class [1]. The main approaches to learning and pattern classiﬁcation are based on statistical analysis. Simple statistical techniques can easily handle linearly separable classes, as shown in Figure 1(a), but piecewise representations, as shown in Fig- ure 1(b), require more robust techniques, such as Artiﬁcial Neural Networks. Unfortunately, many applications involve non linearly-separable classes, as shown in Figure 1(c). Pos- sible solutions in these cases are Support Vector Machines (SVM) [2] and the classic k-nearest neighbours algorithm [3]. One of the most important features of sample spaces, which has not received much attention in supervised classiﬁ- cation, is the relation of distance between samples (specially along sequence of samples). A recent research that explores this relationship has obtained promising results for super- vised and unsupervised learning using the OPF (Optimal Path Forest) classiﬁer [4][5]. OPF is a supervised pattern classiﬁcation framework, par- ticularly effective in image classiﬁcation, which reduces the (a) (b) (c) Figure 1. Examples of feature spaces: (a) Linearly separable. (b) Piecewise linearly separable. (c) Non separable. pattern classiﬁcation problem to the problem of partitioning the vertices of a graph. The problem of pattern recognition can be modeled as a complete graph with positive weights on its arcs, where the nodes are the samples, represented by their respective feature vectors, and the arcs are deﬁned by an adjacency relation between samples [6], [7], [8]. The vertices of the graph can thus be partitioned into optimal-path trees rooted at their respective prototypes (seeds) obtained in the training phase. The label of the most closely connected prototype gives the classiﬁcation of a new input sample. The calculation of the Optimal Paths for new samples is performed using the Image Forest Transform (IFT) al- gorithm [8]. The IFT technique is essentially Dijkstra’s algorithm, modiﬁed to receive various sources and more general cost functions. It initially assigns the minimum cost function to the source nodes and propagates it to the other nodes in nondecreasing order, partitioning the graph into an Optimal-Paths Forest where the roots are the prototypes. In this paper, we present OPF-LA (Optimal Path Forest– Local Analysis), a new model of supervised classiﬁer. It employs the main ideas of OPF classiﬁers, suggesting a new procedure in the training phase. We also present a generalization of the OPF-LA technique, which expands the number of prototypes representing each class, increasing the space of possibilities to control data sorting. The results show that OPF-LA outperforms Support Vec- tor Machines (SVM), Artiﬁcial Neural Networks using Mul- tilayer Perceptrons (MP), and Optimal Path Forest (OPF), in the majority of applications, in accuracy, precision, and