International Journal of Electrical and Computer Engineering (IJECE) Vol. 12, No. 5, October 2022, pp. 5435~5443 ISSN: 2088-8708, DOI: 10.11591/ijece.v12i5.pp5435-5443 5435 Journal homepage: http://ijece.iaescore.com Optimal k-means clustering using artificial bee colony algorithm with variable food sources length Sabreen Fawzi Raheem, Maytham Alabbas Department of Computer Science, College of Computer Science and Information Technology, University of Basrah, Basrah, Iraq Article Info ABSTRACT Article history: Received Aug 17, 2021 Revised May 25, 2022 Accepted Jun 18, 2022 Clustering is a robust machine learning task that involves dividing data points into a set of groups with similar traits. One of the widely used methods in this regard is the k-means clustering algorithm due to its simplicity and effectiveness. However, this algorithm suffers from the problem of predicting the number and coordinates of the initial clustering centers. In this paper, a method based on the first artificial bee colony algorithm with variable-length individuals is proposed to overcome the limitations of the k-means algorithm. Therefore, the proposed technique will automatically predict the clusters number (the value of k) and determine the most suitable coordinates for the initial centers of clustering instead of manually presetting them. The results were encouraging compared with the traditional k-means algorithm on three real-life clustering datasets. The proposed algorithm outperforms the traditional k-means algorithm for all tested real-life datasets. Keywords: Artificial bee colony algorithm K-means algorithm Optimize k-means clustering Variable-length representation This is an open access article under the CC BY-SA license. Corresponding Author: Maytham Alabbas Department of Computer Science, College of Computer Science and Information Technology, University of Basrah Basrah, Iraq Email: ma@uobasrah.edu.iq 1. INTRODUCTION It is important to utilize various data mining techniques, including cluster analysis, to identify, analyze, and categorize data attributes. Researching in data clustering is still active. It is used extensively in various fields, such as medical sciences, image analysis, machine learning, web cluster engines, classification, knowledge discovery, and software engineering. Clustering is splitting the area or population into groups to make data points in the same group more comparable than data points in other groups. It is one of the most often used strategies for unsupervised classification. Several exist a variety of unsupervised clustering algorithms available including k-means [1], cobweb [2], farthest-first [3], expectation- maximization (EM) [4], density-based [5], and hierarchical clustering [6]. On the whole, though, the most widely used is the k-means algorithm. K-means clustering is a well-known partition algorithm. It was widely utilized in scientific research and industrial applications due to its simplicity, rapid convergence, and suitability for massive data sets processing, among others. The traditional k-means clustering method allocated random beginning points during clustering center initialization and typically found a local optimum clustering result. As a result, the lack of stability affected categorization accuracy, and a globally optimized method is required to overcome the limitations of this algorithm. Numerous researches have been conducted to address this issue. For instance, Maulik and Bandyopadhyay [7] suggest using a genetic algorithm (GA) to search in the feature space for cluster centers