Abstract—Fast Balanced K-means (FBK-means) clustering approach is one of the most important consideration when one want to solve clustering problem of balanced data. Mostly, numerical experiments show that FBK-means is faster and more accurate than the K-means algorithm, Genetic Algorithm, and Bee algorithm. FBK- means Algorithm needs few distance calculations and fewer computational time while keeping the same clustering results. However, the FBK-means algorithm doesn’t give good results with imbalanced data. To resolve this shortage, a more efficient clustering algorithm, namely Fast K-means (FK-means), developed in this paper. This algorithm not only give the best results as in the FBK-means approach but also needs lower computational time in case of imbalance data. Keywords—Clustering, K-means Algorithm, Bee Algorithm, Genetic Algorithm, FBK-means Algorithm, FK-means Algorithm. I. INTRODUCTION ecently, information growing in huge volumes creates the need to process large amounts of data. In this direction, a large place given to data mining. This direction includes methods other than classical analysis, based on simulation, and solving problems of generalization, association and finding patterns. The problem of cluster analysis, known as the problem of automatic grouping of objects is perhaps one of the studied widely in the data mining and machine learning communities [1-8]. The task of clustering is one of the main tasks of data mining, related to learning strategy without a teacher. The essence consists in separates clustering objects into clusters based on similarity of their properties. Each cluster should consist of similar objects, and objects of different clusters should differ [9-11]. The most common algorithm is an iterative clustering algorithm k-means. However, the clustering results obtained ensure local optimization solutions only [11]. To solve this problem, several approaches have presented, such as Genetic Algorithm [12], Simulated Annealing [13], and Particle Swarm Optimization [14]. Among these methods, the fast balanced k- means clustering (FBK-means) algorithm proposed [15]. It is an effective search approach, which minimizes an objective function to discover new cluster centers. The numerical experiment results have shown the FBK-means algorithm can find a global or close to global minimizer of the k-means objective function. FBK-means algorithm improve the validity of all the clusters by moving the center of cluster which has a smaller negative validity to the cluster which has a large positive validity of each iteration. FBK-means algorithm needs few distance calculations and fewer computing time while keeping the same clustering results. However, the FBK-means algorithm doesn’t give good results with imbalanced data [15]. To resolve the problem above, a more efficient clustering algorithm, namely Fast K-means (FK-means), developed in this paper. This algorithm not only gives the best results, as in the FBK-means approach, but also requires less computational time as in the case of imbalanced data. The rest of this paper organized as follows. Section 2 introduce a survey of the k-means and FBK-means algorithms. Section 3 presents a more efficient FK-means algorithm. Section 4 shows the effectiveness of the proposed algorithm on some synthetic and real datasets. Finally, Section 5 summarizes the results of the paper with some observations. II. K-MEANS AND FBK-MEANS ALGORITHMS A. K-means algorithm K-means algorithm [16] is one of the most popular clustering algorithms, due to its simplicity and effectiveness. The basic idea of the algorithm is that at each iteration the center is recalculated for each cluster obtained in the previous step, then the elements of the vectors are divided again into clusters in accordance with which of the new centers is nearer the selected metric. The steps of the algorithm can be summarized as follows: 1. set the number of clusters (required as an input parameter of the algorithm) 2. formation of an initial approximation of cluster centers 3. classify each object to the nearest center (distance to the center is calculated as the Euclidean distance) 4. The calculation of the new position of the centers 5. if the positions of the centers have changed, go to step 3 Despite its simplicity, the algorithm has its drawback. The algorithm is very affected by the choice of initial approximations centers. One common way of solving this problem is to run k-means algorithm more than once. However, with the growing number of clusters and increase the size of the data more and more, initial points are needed to get a close to global solution to the clustering problem. Accordingly, run k- means algorithm more than once, take a very long time and is most effective to solve clustering problems, even in large data sets moderately [17]. To solve this problem the fast balanced k-means clustering Speedy Algorithm for Clustering Imbalanced Data M. H. Marghny1, Ahmed I. Taloba2, Rasha M. Abd El-Aziz3 Computer Science Department, Faculty of Computer and Information, Assiut University, Egypt.1, 2 Computer Science Department, Faculty of Science, Assiut University, Egypt.3 marghny@aun.edu.eg, ahmedtaloba@fci.au.edu.eg, rashatop@gmail.com R