1 Understanding the Importance of Heart Sound Segmentation for Heart Anomaly Detection Theekshana Dissanayake, Tharindu Fernando Member, IEEE, Simon Denman, Member, IEEE, Sridha Sridharan, Life Senior Member, IEEE, Houman Ghaemmaghami, Clinton Fookes,Senior Member, IEEE Abstract—Traditionally, abnormal heart sound classiﬁcation is framed as a three-stage process. The ﬁrst stage involves segmenting the phonocardiogram to detect fundamental heart sounds; after which features are extracted and classiﬁcation is performed. Some researchers in the ﬁeld argue the seg- mentation step is an unwanted computational burden, whereas others embrace it as a prior step to feature extraction. When comparing accuracies achieved by studies that have segmented heart sounds before analysis with those who have overlooked that step, the question of whether to segment heart sounds before feature extraction is still open. In this study, we explicitly examine the importance of heart sound segmentation as a prior step for heart sound classiﬁcation, and then seek to apply the obtained insights to propose a robust classiﬁer for abnormal heart sound detection. Furthermore, recognizing the pressing need for explainable Artiﬁcial Intelligence (AI) models in the medical domain, we also unveil hidden representations learned by the classiﬁer using model interpretation techniques. Experimental results demonstrate that the segmentation plays an essential role in abnormal heart sound classiﬁcation. Our new classiﬁer is also shown to be robust, stable and most importantly, explainable, with an accuracy of almost 100% on the widely used PhysioNet dataset. Index Terms—Heart sound segmentation, Biomedical Signal Processing, Phonocardiogram, Neural Networks. I. I NTRODUCTION Cardiovascular diseases have become one of the leading causes of death, and often lead to other medical conditions such as strokes, hypertension, heart failure and arrhythmia [1]– [3]. In the ﬁeld of biomedical engineering, automatic abnormal heart sound detection can be considered a major prior step to cardiovascular disease diagnosis. The process of identifying whether a given heart sound is normal or abnormal can be divided into three major steps: segmentation, feature extrac- tion, and classiﬁcation [4]. Firstly, the segmentation technique locates the fundamental heart sounds of the Phonocardiogram (PCG) signal: S1 (ﬁrst heart sound) and S2 (second heart sound); see Figure 1. However, detecting the fundamental heart sounds is itself a complex task, and can be affected by other internal sounds such as murmurs, the presence of third (S3) and fourth (S4) heart sounds and noise [5]. After segmenting the heart sound, various feature extraction techniques are used to extract features from the signal for training a classiﬁer. These extracted features can generally be categorised as the time domain, frequency domain or time-frequency domain T. Dissanayake, T. Fernando, S. Denman, S. Sridharan and C. Fookes are with the Speech Audio Image and Video Technologies (SAIVT) Research Lab, Queensland University of Technology, Australia. H. Ghaemmaghami is with the M3DICINE Pty Ltd. Fig. 1: Phonocardiogram signal with S1 and S2 locations. features of the PCG wave. As the ﬁnal step, using the extracted features, a classiﬁer is developed to identify abnormal heart sounds. Studying these fundamental steps followed for normal- abnormal heart sound classiﬁcation, the segmentation step can be considered an essential step to localize the signal before extracting various kinds of features. However several sophisticated studies in the literature, which have achieved superior performance for abnormal heart sound classiﬁcation, have eschewed this step prior to feature extraction [5]–[8]. Therefore, whether segmentation is required prior to develop- ing a classiﬁer is still an open question that should be answered to determine if the additional computational burden of this process is beneﬁcial. In summary, the principal objective of our research is to understand the importance of heart sound segmentation for the normal-abnormal heart sound classiﬁcation task via a series of experiments involving empirical evaluations and model interpretation. Ultimately, the goal of this study is to develop a robust machine learning model for normal-abnormal heart sound classiﬁcation while being able to explain the hidden representations learned by the model, and provide insight into the importance of segmentation for the overall task. The following list highlights the main contributions of our research: 1) We empirically evaluate the importance of heart sound segmentation as a prior step to heart sound classiﬁcation, and then, propose a novel deep learning model which achieves an accuracy of 98.71% for heart sound classi- ﬁcation on the PhysioNet [9] dataset. 2) Going beyond the quantitative results, we interpret the developed deep learning model using the SHAP (SHap- ley Additive exPlanations [16]) algorithm to reveal the hidden representations learned by the model. 3) Based on insights obtained from the ﬁrst experiment regarding segmentation, a second architecture (a slight arXiv:2005.10480v1 [cs.SD] 21 May 2020