1063-6706 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TFUZZ.2018.2879465, IEEE Transactions on Fuzzy Systems A Space Efficient Minimum Spanning Tree Approach to the Fuzzy Joint Points Clustering Algorithm AbstractThe Fuzzy Joint Points (FJP) method is a neighborhood-based clustering method that uses a fuzzy neighborhood relation and eliminates the need for a parameter. Even though the fuzzy neighborhood-based clustering methods are proven to be fast enough, such that tens of thousands of data can be handled under a second, the space complexity is still a limiting factor. In this study, a minimum spanning tree based reduced space FJP (RSFJP) algorithm is proposed. The computational experiments show that the reduced space algorithm enables the method to be used for much larger data sets. Index TermsClustering, Fuzzy neighborhood, Fuzzy joint points, Space efficiency. I. INTRODUCTION The need for better data management and analysis techniques has greatly risen due to the very fast increase in the amount of digital information available. One of the most common techniques is clustering. Clustering helps identify similar groups of data, referred as clusters, without supervision. There have been many different approaches to clustering, which resulted in various methods [1, 2]. The focus of earlier studies is on hierarchical clustering methods that build a hierarchy of clusters where each level of the hierarchy corresponds to a unique partitioning [3]. Generally, hierarchical methods either combine smaller data clusters to construct larger ones or uncombine larger clusters to construct smaller ones. One major drawback of the simple hierarchical clustering is the high computational load, since many different clustering results are produced, even if only a single result is desired. Choosing a meaningful subset of clustering results also requires extra work. A significant portion of the proposed clustering methods in the literature employ the classical k-means algorithm [4]. In k- means, the similarity of data is measured directly by using a distance function and the objective is to minimize the average distance within clusters. The value of the objective function is iteratively improved until convergence is achieved. While k- means based algorithms are typically fast and relatively recent studies improve the k-means clustering to some extent, they yield inherent drawbacks such as requiring a separate method to decide the optimal number of clusters, sensitivity to initial clustering and difficulty of discovering spatially irregular shaped clusters. One of the successful clustering approaches is density-based clustering, which investigates the neighborhood of data points rather than merely the distance between them and shapes the clusters around dense neighborhoods. The density-based approach is introduced with DBSCAN (Density Based Spatial Clustering of Applications with Noise) [5]. While inherent disadvantages of k-means are eliminated in DBSCAN, somewhat similar difficulties arise due to the need of a delicate setting of neighborhood parameters. Some later studies introduced methods to overcome the parameter setting difficulty of density-based clustering by maintaining an ordering or hierarchy of clusters [6, 7]. Changing the crisp neighborhood relation in DBSCAN with a fuzzy neighborhood relation as in FN-DBSCAN (Fuzzy Neighborhood-DBSCAN) has shown to be more robust to datasets with various distributions and densities [8]. The Fuzzy Joint Points (FJP) method also makes use of fuzzy neighborhood relation while eliminating the need of any parameter input by extracting the necessary information from the transitive closure of the neighborhood relation [9]. While FJP provides neighborhood-based clustering in an automatic fashion, a straightforward implementation of the method could be unacceptably slow for a clustering application. In [10] an optimal time FJP algorithm running in O( 2 ) time for data pointswas proposed by adopting different techniques to reach the lower time complexity bound. FJP based heuristics have also been proposed that somewhat sacrifice the autonomy to achieve constant speedups without losing clustering efficiency [10, 11]. On the other hand, maintaining a fuzzy matrix throughout the whole process brings along a high memory demand. In fact, the past computational experiments were hindered by the memory limitation and not the computational load [9, 10]. In this paper, we deliberate how graph correspondence of fuzzy neighborhood can be exploited to achieve a memory efficient FJP algorithm and propose one with linear space complexity while maintaining a fair time complexity. The preliminaries and the FJP method are given in the next section. In the third section, a memory efficient FJP algorithm is explained and analyzed. The experimental results are presented in the fourth section and the last section discusses the outcomes to conclude the paper. II. THE FUZZY JOINT POINTS METHOD Some definitions and theorems are given in the following for the sake of integrity of this paper, even though the reader could Can Atilgan and Efendi Nasibov C. Atilgan and E. Nasibov are with the Department of Computer Science, Dokuz Eylul University, 35390 Izmir, Turkey (e-mail: can.atilgan@deu.edu.tr; efendi.nasibov@deu.edu.tr). Also, E. Nasibov is with the Institute of Control Systems, Azerbaijan National Academy of Sciences, Az1141 Baku, Azerbaijan.