DATA SELECTION BASED ON FUZZY CLUSTERING DONGHAI GUAN, WEIWEI YUAN, YOUNG-KOO LEE * , ANDREY GAVRILOV AND SUNGYOUNG LEE Department of Computer Engineering, Kyung Hee University Suwon, 446-701, Korea When the number of training data is limited, the performance of supervised learning could be improved if valuable samples are selected for training. In this work, we propose a novel data selection method based on fuzzy clustering. Our method first partitions all the data which need to be classified into clusters. Then training data are selected from each cluster based on their membership degrees. Experimental results show that our proposed fuzzy clustering-based data selection method could effectively improve the performance of learning compared with randomly selecting training samples. 1. Introduction When designing a supervised learning system, usually we need enough training data. However, in many cases, we have to limit the number of training data. The reason is that only labeled data can be used for training and labeled data are often difficult, expensive, or time consuming to obtain, as they require the efforts of experienced human annotators [1]. In these cases, how to achieve a good classifier as best as possible with a reasonable number of labeled training data is an important issue. Many approaches have been proposed to solve this issue. These approaches can be divided into two primary topics: semi-supervised learning [2][3][4] and training data selection [5][6][7][8]. Assuming the cost associated with the labeling efforts is uniform for all the samples in a dataset, data selection aims to choose the most valuable samples to label. When labeled data are given, semi- supervised learning aims to utilize unlabeled data to improve learning performance. Our work focuses on data selection. In this paper, we propose a novel data selection method based on fuzzy clustering. First of all, fuzzy c-means is used for data clustering. Then two parts of samples are selected. The first part includes the samples with high degrees of 1 * Corresponding author.