Dissimilarity-based Learning from Imbalanced Data with Small Disjuncts and Noise V. Garc´ ıa 1 , J.S. S´ anchez 2 , H.J. Ochoa Dom´ ınguez 3 , and L. Cleofas-S´ anchez 2 1 Multidisciplinary University Division, Universidad Aut´ onoma de Ciudad Ju´ arez, Ciudad Ju´arez, Chihuahua, Mexico 2 Institute of New Imaging Technologies, Department of Computer Languages and Systems, Universitat Jaume I, Castell´ o de la Plana, Spain 3 Department of Electrical and Computer Engineering, Universidad Aut´ onoma de Ciudad Ju´ arez, Ciudad Ju´ arez, Chihuahua, Mexico Abstract. This papers compares the behavior of three linear classiﬁers modeled on both the feature space and the dissimilarity space when the class imbalance of data sets interweaves with small disjuncts and noise. To this end, experiments are carried out over three synthetic databases with diﬀerent imbalance ratios, levels of noise and complexity of the small disjuncts. Results suggest that small disjuncts can be much better overcome on the dissimilarity space than on the feature space, which means that the learning models will be only aﬀected by imbalance and noise if the samples have ﬁrstly been mapped into the dissimilarity space. Keywords: dissimilarity space, imbalance, small disjuncts, noise 1 Introduction A complex problem in many supervised learning applications is associated with signiﬁcant disparities between the prior probabilities of diﬀerent classes, which is usually known as the class imbalance problem [16]. A data set is said to be imbalanced when the samples of one class largely outnumber the samples of the other classes. For a binary problem, the minority class is also referred to as positive because it is often the most interesting one from the point of view of the learning task, whereas the majority class is generally quoted as negative. The main dilemma in imbalanced data is that the majority class distorts the decision boundaries to the detriment of the minority class, which leads to very low accuracies in classifying positive samples [6]. However, several studies have pointed out that class distributions do not hinder the learning task by itself, but there usually exist other diﬃculties related with this problem that contribute to the loss of performance [5]. Among others, small disjuncts and noisy data represent two practical examples of data complexities that should be addressed in detail so that the classiﬁcation models can achieve better performance results [7]. The reasons of focusing this study on small disjuncts and noisy data are twofold: the existence of class imbalance is closely linked to the problem of small disjuncts and noise, and learning from imbalanced data with these two