A New Possibilistic Classiﬁer for Heart Disease Detection From Heterogeneous Medical Data Karim Baati #*1 , Tarek M. Hamdani #+2 , Adel M. Alimi #3 , Ajith Abraham &4 # REGIM-Lab.: REsearch Groups on Intelligent Machines, University of Sfax, National Engineering School of Sfax (ENIS), BP 1173, Sfax, 3038, Tunisia 1 karim.baati@ieee.org 2 tarek.hamdani@ieee.org 3 adel.alimi@ieee.org ∗ Esprit School of Engineering, Tunis, Tunisia + Taibah University, College Of Science And arts at Al-Ula, Al-Madinah al-Munawwarah, KSA & Machines Intelligence Research Labs (MIR Labs), Scientiﬁc Network for Innovation and Research Excellence P.O. Box 2259, Auburn, WA 98071, USA 4 ajith.abraham@ieee.org Abstract—In this paper, we propose a new Hybrid Na¨ ıve Possibilistic Classiﬁer (HNPC) for heart disease detection from the heterogeneous data (numerical and categorical) of the Cleve- land dataset. The proposed classiﬁer stands for an extension of two versions of HNPC which have been already proposed to deal with the same problem. As the two former HNPC, the proposed classiﬁer separates data into two subsets (numerical and categorical) and then estimates possibility beliefs using the two versions of the probability-possibility transformation method of Dubois et al. for numerical and categorical data, respectively. Moreover, like the recent version of HNPC, our new classiﬁer performs a common fusion to combine the obtained beliefs. Nevertheless, instead of using the product and the minimum as combination operators during the fusion step, the proposed classiﬁer calls a Generalized Minimum-based algorithm (G-Min) as an improvement of the minimum operator when making decision from possibilistic beliefs. Experimental evaluations on the Cleveland dataset show that the proposed G-Min-based HNPC outperforms the two former versions of HNPC as well as the main classiﬁcation techniques which have been used in related work. Index Terms—Na¨ ıve possibilistic classiﬁer, G-Min algorithm, heterogeneous data, subjective data, Cleveland dataset, heart disease. I. I NTRODUCTION Heart diseases stand for serious diseases which cause a continuing high mortality [1]. Therefore, early detection of these illnesses stands for an absolute necessity due to their serious impact on human health. In order to attain this purpose, a panoply of computational intelligence techniques has been evaluated on many heart disease datasets. Among data which have been widely used in this context, one can refer to the University of California Irvine (UCI) heart disease dataset also known as the Cleveland dataset 1 . 1 http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/ cleve.mod The choice of the adequate technique which leads to the best classiﬁcation results for a decision making problem is usually a challenging task [2][3]. That may become more challenging when the ﬁnal decision is extremely crucial such as in the medical ﬁeld in general [3] and in heart disease detection in particular [4]. Thinking logically, two possible strategies are likely to lead to the best choice. The ﬁrst is to ﬁnd the classiﬁcation technique which usually outperforms other techniques regardless the decision making problem. The second is to get the classiﬁer which outperforms the rest of methods with regard to the problem speciﬁcations [5]. Since there is no classiﬁer in machine learning which performs better than others in all contexts [6], the second strategy seems to be the only realistic. For instance, this second strategy has been used to establish the two versions of Hybrid Na¨ ıve Possibilistic Classiﬁer (HNPC) which have been respectively proposed in [7] and [8] to deal with heart disease detection from the Cleveland dataset. In order to build the two former versions of HNPC, a careful study of the Cleveland data speciﬁcations has been ﬁrst performed [7] [8]. Indeed, these data are heterogeneous either regarding information sources (some of the medical data are stemming from medical devices and other from physicians) or type (some of the data are numerical and other are categorical). Furthermore, the part of data coming from physicians judgments is subjective. HNPC considers speciﬁcations of these data through two aspects. First, this type of classiﬁer is based on the possibility theory which has shown good performance when dealing with imperfect data [9] especially poor, heterogeneous [10] and subjective [11]. Secondly, unlike other computational intelligence techniques which have considered the available data of the Cleveland dataset as numerical when performing the classiﬁcation task, the HNPC separates these data into two subsets according to their types (numerical or categorical) and calls an adequate International Journal of Computer Science and Information Security (IJCSIS), Vol. 14, No. 7, July 2016 443 https://sites.google.com/site/ijcsis/ ISSN 1947-5500