Jing, S., Kane, B., and Luz, S. Automatic content segmentation of audio recordings at multidisciplinary medical team meetings. In Proceedings of the International Conference on Information Technology (2008), A. Stepnowski et al., Eds., IEEE Computer Society, pp. 309–312. Automatic Content Segmentation of audio recordings at multidisciplinary medical team meetings * Jing Su, Bridget Kane, Saturnino Luz Department of Computer Science Trinity College Dublin, Ireland {sujing, kanebt, luzs}@cs.tcd.ie Abstract A single recording of a multidisciplinary medical team meeting (MDTM) can be expected to contain several sep- arate discussions on different patients. Automatic speaker segmentation alone does not allow for the separation of in- dividual patient case discussions (PCDs). A novel method is presented here, based on Hidden Markov Models (HMM), to segment audio recordings of MDTMs and facilitate the non-linear retrieval of individual PCDs. The method com- bines professional role interaction with speaker vocaliza- tion patterns. The sequence and duration of vocalization and speakers’ roles are used as training states. Results demonstrate HMM segmentation to have good potential in the development of an MDTM browser. The approach out- lined here can be applied in a wide range of meetings. 1. Introduction Multiparty meeting browsing and retrieval is a novel re- search topic. In recent years its development focuses on audio segmentation algorithms. We propose a new method of topic segmentation for multidisciplinary medical team meetings (MDTMs), (a speciﬁc type of multi-party meet- ing) which is text-independent. Contrary to the usual prac- tice in dialogue segmentation of identifying topics through text and/ or key word recognition, we decided, in the ﬁrst in- stance, to explore the possibility of segmenting an MDTM by patient case discussion in a ‘content-free’ manner (i.e. without appealing to transcription). A patient case discus- sions (PCD) at an MDTM is a highly structured event and therefore its vocalization patterns should be amenable to au- tomatic generalization. A text-independent topic segmentation method for MDTMs is proposed for many reasons, not least because * This research was supported by Science Foundation Ireland Research Frontiers grant under the National Development Plan. of the sensitive nature of the MDTMs content and the need to respect patient privacy. But mainly because automatic meeting transcription is a challenging task in open meeting environments (because of background noise and cross talk) and because meeting transcription loses many features such as speaker identiﬁcation, speed, tone and volume of speech in conversion. MDTMs are decision-making fora and are routine in hospital work, especially for cancer patient management. Audio recordings of MDTMs can be valuable sources of knowledge and tools to direct patient management. Sub- tleties within the discussion and of the decision-making pro- cess can be captured in electronic meeting records (EMR)s, that cannot be truly captured in text-based records. Yet, EMRs have not yet been incorporated into electronic patient records (EPRs). Lack of efﬁcient methods to retrieve data is part of the difﬁculty in developing such records. Simple lin- ear access methods are time consuming and are not realistic solutions for the incorporation of EMRs into EPRs. While the approach in this paper, namely to utilize speech feature information to overcome the challenge of topic segmentation, is conﬁned to MDTMs recordings, our approach can potentially be applied in a wide range of meet- ing settings. 1.1. Background Maganti [6] compared the performance of four meth- ods of speech/ non-speech segmentation: energy based method, energy combined with zero-crossing, modulation spectrum based segmentation and multi-layer perceptron. Modulation spectrum based segmentation (MSS) was found to be most helpful in speech recognition [6]. Bayesian Information Criterion (BIC) has also been used to anal- yse speaker segmentation, using an algorithm that detects speaker change in continuous speech [3]. Speaker cluster- ing technique (or speaker identiﬁcation) is a development of speaker segmentation. By labelling all vocalizations from the same speaker in a lengthy recording, and applying Gaus-