APPLICATION Effective kernel-based possibilistic fuzzy clustering techniques: analyzing cancer database S. R. Kannan 1 & M. Siva 1 & R. Devi 2 & S. Ramathilagam 3 & Mark Last 4 Received: 7 September 2018 /Revised: 22 November 2018 /Accepted: 19 December 2018 # Springer Nature Switzerland AG 2019 Abstract This paper aims to present optimal clustering techniques for analyzing high-dimensional cancer databases with missing attributes and overlapped objects. Analyzing the high-dimensional database with missing values is considered as most difficult task, and so far, there is no optimal cluster technique available for clustering the cancer database. Therefore, this paper develops the effective fuzzy clustering techniques that incorporate Cauchy kernel induced distance, rudimentary centroids, possibilistic memberships, fuzzy memberships, and prototype equation. To reduce the computing time of algorithms, this paper introduces a method for finding reasonable initial cluster centers. Experimental results indicate that the proposed methods are suitable for the breast cancer databases with missing attributes, and the results indicate that the methods outperform in clustering the databases into available subclasses. Keywords Clustering . Fuzzy C-means . Kernel distance . High-dimensional databases . Gene expression database Introduction The main aim of this paper is to analyze the high- dimensional gene expression breast cancer database with missing attributes and overlapping of objects into avail- able subtypes for diseases. Breast cancer is one of the main leading causes of death among women [20] since the last decades. Early recognition of the types either cancerous or noncancerous can help in the diagnosis of the disease for woman, and it can help strongly to en- hance the expectancy of survival. High-dimensional gene expression breast cancer database is considered as a best technique in analyzing the types of cancers [5, 28]. Due to missing attributes and overlapping of objects, analyz- ing the types in high-dimensional gene expression cancer database is considered as a difficult task. Handling the missing attributes in gene expression databases with im- proper techniques can easily lead to biased outcome [8]. Therefore, design of an effective diagnosis model is an important issue in breast cancer data for finding available types of cancers. Researchers have introduced clustering- based algorithms to analyze the available subtypes of cancers in breast database [2, 4, 10, 25, 34]. Clustering is an important tool in analyzing the large dimension of the databases in various data analyzing process [29]. In recent years, the fuzzy set based clustering techniques are playing very important role in high-dimensional medical databases for analyzing the available subtypes of diseases [3, 15, 32–34]. The fuzzy C-means (FCM) algorithm [11, 12] assign the objects to multiple clusters by varying the fuzzy membership grades, which influence the all the objects to update the prototypes of clusters. The non- fuzzy clustering techniques set the data object exactly into one cluster, so the techniques are considered as un- successful to cluster the missing overlapped dataset [7, 17]. Fuzzy clustering techniques are of considerable ben- efits because they could retain more information from the high-dimensional databases than other clustering tech- niques [9, 11, 16, 27]. However, because of the missing attributes and overlapping of objects in high-dimensional gene expression breast cancer database, the existed fuzzy clustering algorithms have demonstrated to be difficult to * S. R. Kannan srkannan.mat@pondiuni.edu.in 1 Pondicherry University (A Central University of India), Pondicherry, India 2 Pachaiyappa’ s College for Men, Chennai, India 3 Periyar Govt. Arts College, Cuddalore, Tamil Nadu, India 4 Ben-Gurion University of the Negev, Beersheba, Israel Data-Enabled Discovery and Applications https://doi.org/10.1007/s41688-018-0026-1