Short Paper—Determining the Optimal Number of Clusters using Silhouette Score as a Data Mining… Determining the Optimal Number of Clusters using Silhouette Score as a Data Mining Technique https://doi.org/10.3991/ijoe.v19i04.37059 Ylber Januzaj 1 , Edmond Beqiri 1() , Artan Luma 2 1 Faculty of Business, University “Haxhi Zeka”, Peja, Kosovo 2 CST Faculty, South East European University, Tetovo, Northern Macedonia edmond.beqiri@unhz.eu Abstract—The identifcation of the same objects is very important in deter- mining the similarity between different objects. Nowadays, there are several techniques that allow us to divide objects into different groups that differ from one to another. In order to have the best separation between the clusters, it is required that the optimal determination of the number of clusters of a corpus be made in advance. In our research, the Silhouette score technique was used in order to make the optimal determination of this number of clusters. The applica- tion of such a technique was done through the Python language, and a corpus of unstructured job vacancy data was used. After determining the optimal number, at the end we present these clusters and the similarity between them, this presen- tation will be done in the form of a graph in a suitable format. Keywords—Silhouette score, clusters, data mining, corpus, job vacancy 1 Introduction The dynamic growth of data from second to second has made their processing even more challenging in terms of extracting different analyses [1]. Different felds have progressed based on the analyzes that Data Mining enables us. One of the biggest challenges of Machine Learning is the processing of data that does not have a class label which is known as unsupervised learning [2]. Through unsupervised learning, distributed modeling is enabled so that we have more information on the data being processed. One of the most preferred forms is clustering, which allows us to detect different groups within a corpus of data [3]. These groups contain objects that are very similar to each other within the same group, while they have a great distinction with the objects of other groups. Clustering has found application in many different felds, and through it the analysis of data that has been impossible to analyze through other techniques has been made possible [4]. Some of the felds where Clustering has found application are: bioinfor- matics, medicine, social sciences, computer sciences, etc. In our research, we analyze our corpus in order to determine the optimal or most appropriate number of clusters that our corpus will contain. Later technique that will 174 http://www.i-joe.org