ORIGINAL ARTICLE A Novel Fuzzy Rough Clustering Parameter-based missing value imputation P. S. Raja 1 • K. Sasirekha 1 • K. Thangavel 1 Received: 31 January 2019 / Accepted: 3 October 2019 Ó Springer-Verlag London Ltd., part of Springer Nature 2019 Abstract For a long time, missing values are the biggest challenging problem in data mining, machine learning and statistical analysis. In the current scenario, various methods exist to handle the missing values as it’s an important task to discover meaningful information. However, the most frequently used method to handle the missing values in a large dataset is discarding the instances with missing values. In such situation, deletion of instances with missing values causes loss of crucial information, which affects the performance of algorithms. Hence, an intelligent method needs to handle the missing values. In the recent past, the fuzzy and rough set has been widely employed in many applications. In this research work, a Novel Fuzzy C-Means Rough Parameter-based missing value imputation method is proposed with the hybridization of the fuzzy and rough set to handle missing values. The proposed algorithm is capable of handling the situation of uncertainty and vagueness in datasets through rough and fuzzy sets while maintaining vital information. The experimentation has been carried out on three benchmark datasets such as the Dukes’ B colon cancer dataset, the Mice Protein Expression and Yeast datasets to asses the efﬁcacy of the proposed method. It is observed that the proposed method produces improved results than Fuzzy C-Means Centroid-based missing value imputation and Fuzzy C-Means Parameter-based missing value imputation method. Keywords Preprocessing  Missing value  Machine learning  Fuzzy C-Means  Rough K-Means 1 Introduction In today’s world, missing data are pervasive in many huge datasets. Some of the signiﬁcant reasons for the cause of missing values are measurement error, ignorance, data corruption and equipment failure [1–4]. Furthermore, handling missing value is the biggest challenging task in many data mining applications. In general, the character- istic of the missingness is identiﬁed by three types such as Missing Completely at Random (MCAR), the missing values do not depend on any other variables in the dataset; Missing at Random (MAR), the missing value depends on the other variable in the dataset; and Missing not at Ran- dom (MNAR), the missing value depends on the other missing values in the dataset [5, 6]. More speciﬁcally, the MNAR type is the most difﬁcult situation to model. In the domain of data mining, various preprocessing methods to handle the missing values are zero or mean estimation methods, global constant, ignoring and deleting. However, these methods are not sufﬁcient to deal with the missing values in an efﬁcient way. In the past few years, clustering-based missing values imputation is popularized among the researchers [7–11]. Although clustering algo- rithms are exploited to impute missing values, they fail to handle uncertain information that exists in the dataset. In general, fuzzy and rough set theories are the hot research in artiﬁcial intelligence which is widely employed for handling uncertainty in many applications. As a result, these are the most important and powerful tools for han- dling missing values in a dataset with uncertain informa- tion. The proposed Fuzzy C-Means Rough Parameter- & P. S. Raja psraja5@periyaruniversity.ac.in K. Sasirekha ksasirekha@periyaruniversity.ac.in K. Thangavel drkthangavel@gmail.com 1 Department of Computer Science, Periyar University, Salem, Tamil Nadu, India 123 Neural Computing and Applications https://doi.org/10.1007/s00521-019-04535-9