www.IndianJournals.com Members Copy, Not for Commercial Sale Downloaded From IP - 210.212.129.125 on dated 1-Jan-2015 Volume 4, Number 2, November, 2014 107 An Optimum Cluster Size Identification for k-Means using Validity Index for Stock Market Data An Optimum Cluster Size Identification for k-Means using Validity Index for Stock Market Data Preeti Baser 1,2 * and Jatinderkumar R. Saini 3,4 1 Research Scholar, 3 Research Guide, Faculty of Science R.K. University, Rajkot 360020, Gujarat, India 2 Assistant Professor, Shri Jairambhai Patel Institute of Business Management and Computer Applications, Gandhinagar 382007, Gujarat, India 4 Director (I/C) and Associate Professor, Narmada College of Computer Application, Bharuch 392011, Gujarat, India *Corresponding author Email id: *preeti.dalal@gmail.com; 3 saini_expert@yahoo.com ABSTRACT Clustering is one of the data mining techniques widely used in various application areas. It is a process of assigning data objects in different groups so that data objects in the same group have similar behaviour towards each other and be different from other objects in the other groups. It is also known as an unsupervised technique in which class label is not available. Clustering is one of the most popular data mining techniques used in various financial domains. In today’s competitive financial market, investors want to earn profit from their investments. This paper shows detailed analysis of k-means clustering method using the Davies–Bouldin index to find the optimum number of clusters which is very difficult for this method. These clusters can be used in further investment analysis. Keywords: Clustering, Data Mining, Davies–Bouldin Index (DBI), Financial Ratio, k-means, Portfolio Management, Validity Index Research Article Indianjournals.com International Journal of Data Mining and Emerging Technologies DOI: 10.5958/2249-3220.2014.00008.1 Volume 4, Number 2, November, 2014 1. INTRODUCTION Clustering is one of the data mining techniques widely used in various application areas. It is a process of assigning data objects in different groups so that data objects in the same group have similar behaviour towards each other and be different from other objects in the other groups. It is also known as an unsupervised technique in which class label is not available. Clustering is one of the most popular data mining techniques used in various financial domains [2]. In today’s competitive financial market, investors want to earn profit from their investments. In this research paper, k-means clustering technique is applied to cluster stock market data and detailed analysis of Davies–Bouldin (DB) index to find optimum number of clusters which is very difficult for this method. Validity indices are used for measuring ‘goodness’ of a clustering result comparing to other ones which are created by other clustering algorithms, or by the same algorithms but using different parameter values. The DB [1] index measures the average of similarity between each cluster and its most similar one. As the clusters have to be compact and separated, the lower DB index value means better cluster configuration. Consequently, the number of clusters minimising the DB index is recognised as the optimum quantity of clusters. For negative quantities of the DB index, the absolute amount is considered and lower amounts have better quantities. Then DB index is defined as