Improving Clustering Results through Active Learning Marjan Qazvini Independent Researcher, Iran Abstract Data labelling is a task that arises in various fields, including image processing, voice recognition, and text classification. Active Learning (AL) is a method that can be used to simplify this task. This study focuses on tabular data and the classification of disabilities. We use the English Longitudinal Study of Ageing (ELSA) and different socio-demographic, disease, and disability factors to group participants into various disability levels. Since the ground truth is unknown, we employ different clustering methods. The results show that by combining AL strategies, even with small amounts of data, we can achieve accuracy comparable to that of the entire dataset. keywords — [JEL]C1, C8 keywords — Active Learning, Coclus, K-modes 1 Introduction One area of Machine Learning (ML) focuses on classifying data based on known features. When the ground truths are known, the problem is classified as supervised. When the ground truths are unknown or difficult to obtain, the problem falls under unsupervised learning. In such cases, we must employ different clustering methods. The common issue with homo- geneous data—such as images, voices, and texts—is the shortage of labels. However, this problem has not been fully explored in the context of inhomogeneous data, such as tab- ular data. Active Learning (AL) encompasses a set of strategies designed to address this label shortage by selecting the most informative data. The idea behind AL is that a small amount of informative data can be sufficient for learning a pattern. AL is frequently used in image classification, and therefore, most AL strategies have been developed in this area by combining different neural networks and AL methods. For example, the cost-effective AL (CEAL) method combines Convolutional Neural Networks (CNNs) with AL [26]. This approach selects the least confident data points for manual labelling, while the most confi- dent ones are automatically pseudo-labelled. Challenges arise when trying to combine deep learning models with AL. Deep learning typically requires large datasets, and the process of labelling just one sample in each iteration can be time-consuming. Furthermore, deep models tend to achieve high training accuracy, which reduces uncertainty. To introduce Corresponding author: marjan.qazvini@gmail.com 1