Artificial Intelligence in Medicine 59 (2013) 197–204 Contents lists available at ScienceDirect Artificial Intelligence in Medicine jou rn al hom e page: www.elsevier.com/locate/aiim Data structure-guided development of electrocardiographic signal characterization and classification Adam Gacek Institute of Medical Technology and Equipment ITAM, 118 Roosevelt Street, 41-800 Zabrze, Poland a r t i c l e i n f o Article history: Received 15 January 2013 Received in revised form 25 September 2013 Accepted 27 September 2013 Keywords: Clustering algorithms Fuzzy clustering Cluster classification Electrocardiographic signal classification a b s t r a c t Objective: The study introduces and elaborates on a certain perspective of biomedical data analysis where data structure is revealed through fuzzy clustering. The key objective of the study is to develop a char- acterization of the content of the clusters by offering a number of their descriptors established on the basis of membership grades of patterns included there, as well as on the basis of their class membership. Next, a design of a cluster-based classifier is presented in which the structure of the classifier is based on a collection of clusters. The structure also exploits the descriptors of the clusters as well as aggregates their characteristics with the activation levels of the associated clusters formed in the feature space in which QRS complexes are represented. Methods and materials: The underlying methods involve the use of fuzzy clustering and two essential ways of representing QRS complexes with the use of the Hermite expansion of signals and piecewise aggre- gate approximation (PAA). The material involves QRS segments coming from the MIT-BIH Arrhythmia Database. Results: The key results demonstrate and quantify the effectiveness of QRS characterization with the use of clustering realized in the space of coefficients of the Hermite series expansion and the PAA expansion. In general, accuracy of the discussed classification schemes increases with the increase of the number of clusters; the difference varies in the range of 30% (when moving from 10 to 60 clusters). The fuzzification coefficient of the fuzzy C-Means clustering algorithm has a visible impact on the quality of the results in the range of up 40% difference in the classification of accuracy (when the coefficient varies in-between 1.1 and 2.5). The PAA representation space leads to slightly better results than those obtained when using the Hermite representation of the signals, the difference is of around 5%. Conclusions: It was shown that granular representation of electrocardiographic signals is essential to data analysis and classification by providing a means to reveal and characterize the data structure and by providing prerequisites to construct pattern classifiers. The study also shows that fuzzy clusters deliver important structural information about the data that could be further quantified by looking into the content of clusters. © 2013 Elsevier B.V. All rights reserved. 1. Introduction In pattern classification problems, we encounter a large num- ber of algorithms of unsupervised and supervised learning [1]. Classifiers demonstrate significant geometric diversity, ranging from linear mappings between a feature space and class assign- ment (linear classifiers) to highly nonlinear transformations such as those realized by means of neural networks or support vector machines. Predominantly, classifiers are constructed in a super- vised mode, which means that there are sets of labeled patterns guiding the construction of classification mappings. Several inter- esting developments can be seen in electrocardiographic (ECG) Tel.: +48 32 271 60 13. E-mail address: adam.gacek@itam.zabrze.pl signal description, analysis and classification where recent tech- nologies of pattern recognition and machine learning are involved, see [2–4]. An interesting alternative is to develop the structure of a clas- sifier by taking into consideration the geometry of data existing in the feature space and identified during clustering of patterns (viz. revealing a structure in the feature space in the form of a collection of clusters). The advantage of this approach is that clustering (being unsupervised by nature) considers all patterns in a global manner and looks for some general structure. As a result, we obtain over- all geometry of data and associated classes that are less affected by individual patterns, especially those that are misclassified. In this way, we reduce the problems that are typical of supervised learning when the design of classifiers with highly nonlinear char- acteristics becomes quite sensitive to the existence of possible outliers. 0933-3657/$ see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.artmed.2013.09.004