Supervised Learning Using Local Analysis in an Optimal-Path Forest Willian Paraguassu Amorim FACOM - Institute of Computing Federal University of Mato Grosso do Sul - UFMS Campo Grande, Brazil Email: paraguassuec@gmail.com Marcelo Henriques de Carvalho FACOM - Institute of Computing Federal University of Mato Grosso do Sul - UFMS Campo Grande, Brazil Email: mhc@facom.ufms.br Abstract—In this paper, we present an OPF-LA (Optimal Path Forest–Local Analysis), a new learning model proposal. OPF-LA is a heuristic that uses local information for selecting prototypes that, in turn, will be used to classify new data. It employs the main ideas of an OPF classifier, suggesting a new procedure in the data training phase. Experimental results show the advantages in efficiency and accuracy over classical learning algorithms in areas such as Support Vector Machines (SVM), Artificial Neural Networks using Multilayer Perceptrons (MP), and Optimal Path Forest (OPF), in several applications. Keywords-Supervised classifiers; Optimal-Path Forest; I. I NTRODUCTION Pattern recognition is a research area which aims to classify patterns into categories or classes. Given a set of c classes, ω 1 ,ω 2 , ··· ,ω c , and a pattern, x, a pattern recognition system associates the pattern x to the label i of one of the classes ω i . The pattern classification problem is divided into the following classes: (i) supervised, where each input pattern is identified as a member of a predefined class, (ii) unsupervised, where each input pattern is assigned to a class as yet unknown, and (iii) semi-supervised, where part of the input set has a predefined class [1]. The main approaches to learning and pattern classification are based on statistical analysis. Simple statistical techniques can easily handle linearly separable classes, as shown in Figure 1(a), but piecewise representations, as shown in Fig- ure 1(b), require more robust techniques, such as Artificial Neural Networks. Unfortunately, many applications involve non linearly-separable classes, as shown in Figure 1(c). Pos- sible solutions in these cases are Support Vector Machines (SVM) [2] and the classic k-nearest neighbours algorithm [3]. One of the most important features of sample spaces, which has not received much attention in supervised classifi- cation, is the relation of distance between samples (specially along sequence of samples). A recent research that explores this relationship has obtained promising results for super- vised and unsupervised learning using the OPF (Optimal Path Forest) classifier [4][5]. OPF is a supervised pattern classification framework, par- ticularly effective in image classification, which reduces the (a) (b) (c) Figure 1. Examples of feature spaces: (a) Linearly separable. (b) Piecewise linearly separable. (c) Non separable. pattern classification problem to the problem of partitioning the vertices of a graph. The problem of pattern recognition can be modeled as a complete graph with positive weights on its arcs, where the nodes are the samples, represented by their respective feature vectors, and the arcs are defined by an adjacency relation between samples [6], [7], [8]. The vertices of the graph can thus be partitioned into optimal-path trees rooted at their respective prototypes (seeds) obtained in the training phase. The label of the most closely connected prototype gives the classification of a new input sample. The calculation of the Optimal Paths for new samples is performed using the Image Forest Transform (IFT) al- gorithm [8]. The IFT technique is essentially Dijkstra’s algorithm, modified to receive various sources and more general cost functions. It initially assigns the minimum cost function to the source nodes and propagates it to the other nodes in nondecreasing order, partitioning the graph into an Optimal-Paths Forest where the roots are the prototypes. In this paper, we present OPF-LA (Optimal Path Forest– Local Analysis), a new model of supervised classifier. It employs the main ideas of OPF classifiers, suggesting a new procedure in the training phase. We also present a generalization of the OPF-LA technique, which expands the number of prototypes representing each class, increasing the space of possibilities to control data sorting. The results show that OPF-LA outperforms Support Vec- tor Machines (SVM), Artificial Neural Networks using Mul- tilayer Perceptrons (MP), and Optimal Path Forest (OPF), in the majority of applications, in accuracy, precision, and