Received: November 20, 2017 121 International Journal of Intelligent Engineering and Systems, Vol.11, No.3, 2018 DOI: 10.22266/ijies2018.0630.13 A Novel Density Based Clustering Algorithm by Incorporating Mahalanobis Distance Margaret Sangeetha 1 * Velumani Padikkaramu 2 Rajakumar Thankappan Chellan 3 1 Department of Computer Science, Manonmaniam Sundaranar University, Tirunelveli, India 2 Department of Computer Science, The M.D.T Hindu College, Tirunelveli, India 3 Department of Computer Science, St. Xavier’s College, Tirunelveli, India * Corresponding author’s Email: margaret.msu@gmail.com Abstract: Data clustering is one of the active research areas, which aims to group related data together. The process of data clustering improves the data organization and enhances the user experience as well. For this sake, several clustering algorithms are proposed in the literature. However, a constant demand for a better clustering algorithm is still a basic requirement. Understanding the necessity, this paper proposes a density based clustering algorithm which is based on Density Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm. The main drawback of DBSCAN algorithm is it requires two important parameters as initial input. It is really difficult to fix the values for these parameters, as it requires some prior knowledge about the dataset. This requirement is overthrown by the proposed clustering algorithm by selecting the parameters automatically. The automated selection of parameters is achieved by analysing the dataset and it varies from dataset to dataset. This way of parameter selection improves the quality of service and produce effective clusters. The experimental results show that the proposed approach outperforms the DBSCAN algorithm in terms of purity, F-measure and entropy. Keywords: Density based clustering, Data clustering, Clustering algorithm. 1. Introduction Data is the lifeblood of today’s world and the collected data are stored in voluminous databases. The data must be stored in an organized fashion, such that the required data can easily be located. Data analysis is one of the most essential necessities in all domains, such that the worth of the applications can be enhanced. Data analysis can be performed better, when the related data are stored together. The concept of data clustering hits the scene at this juncture. The major goal of data clustering is to group similar data together. The term data can be audio, video, text, numeric and so on. The related data are grouped together, so as to form different clusters. This makes sense that entities within the cluster show maximum degree of similarity and the entities of different clusters show minimal degree of similarity. This makes the data processing easier and helps to enhance the performance of the application. Owing to its advantages, data clustering is utilized in almost all domains such as healthcare, finance, business oriented, data retrieval, image processing applications and so on. For instance, healthcare applications utilize clustering to group patients with similar symptoms or degree of severity [1]. The business oriented applications cluster the customers, who share the same buying habits [2]. Though the concept of clustering brings in numerous merits to an application, it is extremely difficult to achieve better clusters. A clustering algorithm has to handle several tough challenges such as the selection of better features, distance measures [3] and dealing with noise [4]. Apart from this, a good clustering algorithm must be scalable, capable of handling noise and to find clusters without considering the shape [5]. The clustering algorithms can be broadly divided into partitional, hierarchical, density and grid based clustering [6].