1 Understanding the Importance of Heart Sound Segmentation for Heart Anomaly Detection Theekshana Dissanayake, Tharindu Fernando Member, IEEE, Simon Denman, Member, IEEE, Sridha Sridharan, Life Senior Member, IEEE, Houman Ghaemmaghami, Clinton Fookes,Senior Member, IEEE Abstract—Traditionally, abnormal heart sound classification is framed as a three-stage process. The first stage involves segmenting the phonocardiogram to detect fundamental heart sounds; after which features are extracted and classification is performed. Some researchers in the field argue the seg- mentation step is an unwanted computational burden, whereas others embrace it as a prior step to feature extraction. When comparing accuracies achieved by studies that have segmented heart sounds before analysis with those who have overlooked that step, the question of whether to segment heart sounds before feature extraction is still open. In this study, we explicitly examine the importance of heart sound segmentation as a prior step for heart sound classification, and then seek to apply the obtained insights to propose a robust classifier for abnormal heart sound detection. Furthermore, recognizing the pressing need for explainable Artificial Intelligence (AI) models in the medical domain, we also unveil hidden representations learned by the classifier using model interpretation techniques. Experimental results demonstrate that the segmentation plays an essential role in abnormal heart sound classification. Our new classifier is also shown to be robust, stable and most importantly, explainable, with an accuracy of almost 100% on the widely used PhysioNet dataset. Index Terms—Heart sound segmentation, Biomedical Signal Processing, Phonocardiogram, Neural Networks. I. I NTRODUCTION Cardiovascular diseases have become one of the leading causes of death, and often lead to other medical conditions such as strokes, hypertension, heart failure and arrhythmia [1]– [3]. In the field of biomedical engineering, automatic abnormal heart sound detection can be considered a major prior step to cardiovascular disease diagnosis. The process of identifying whether a given heart sound is normal or abnormal can be divided into three major steps: segmentation, feature extrac- tion, and classification [4]. Firstly, the segmentation technique locates the fundamental heart sounds of the Phonocardiogram (PCG) signal: S1 (first heart sound) and S2 (second heart sound); see Figure 1. However, detecting the fundamental heart sounds is itself a complex task, and can be affected by other internal sounds such as murmurs, the presence of third (S3) and fourth (S4) heart sounds and noise [5]. After segmenting the heart sound, various feature extraction techniques are used to extract features from the signal for training a classifier. These extracted features can generally be categorised as the time domain, frequency domain or time-frequency domain T. Dissanayake, T. Fernando, S. Denman, S. Sridharan and C. Fookes are with the Speech Audio Image and Video Technologies (SAIVT) Research Lab, Queensland University of Technology, Australia. H. Ghaemmaghami is with the M3DICINE Pty Ltd. Fig. 1: Phonocardiogram signal with S1 and S2 locations. features of the PCG wave. As the final step, using the extracted features, a classifier is developed to identify abnormal heart sounds. Studying these fundamental steps followed for normal- abnormal heart sound classification, the segmentation step can be considered an essential step to localize the signal before extracting various kinds of features. However several sophisticated studies in the literature, which have achieved superior performance for abnormal heart sound classification, have eschewed this step prior to feature extraction [5]–[8]. Therefore, whether segmentation is required prior to develop- ing a classifier is still an open question that should be answered to determine if the additional computational burden of this process is beneficial. In summary, the principal objective of our research is to understand the importance of heart sound segmentation for the normal-abnormal heart sound classification task via a series of experiments involving empirical evaluations and model interpretation. Ultimately, the goal of this study is to develop a robust machine learning model for normal-abnormal heart sound classification while being able to explain the hidden representations learned by the model, and provide insight into the importance of segmentation for the overall task. The following list highlights the main contributions of our research: 1) We empirically evaluate the importance of heart sound segmentation as a prior step to heart sound classification, and then, propose a novel deep learning model which achieves an accuracy of 98.71% for heart sound classi- fication on the PhysioNet [9] dataset. 2) Going beyond the quantitative results, we interpret the developed deep learning model using the SHAP (SHap- ley Additive exPlanations [16]) algorithm to reveal the hidden representations learned by the model. 3) Based on insights obtained from the first experiment regarding segmentation, a second architecture (a slight arXiv:2005.10480v1 [cs.SD] 21 May 2020