Histogram Based Method for Unsupervised Meeting Speech Summarization Nouha Dammak 1,2(B ) and Yassine BenAyed 1 1 Multimedia InfoRmation Systems and Advanced Computing Laboratory (MIRACL), 3021 Sfax, Tunisia nouha.damak@gmail.com, yassine.benayed@isims.usf.tn 2 Higher Institute of Computer Sciences and Communication Techniques, University of Sousse, Sousse, Tunisia Abstract. The appearance of various platforms such as YouTube, Dailymotion and Google Video has a major role in the increasing of the number of videos avail- able on the Internet. For example, more than 15000 video sequences are seen every day on Dailymotion. Consequently, the huge gathered amount of data constitutes a big scientific challenge for managing the underlying knowledge. Particularly, data summarization aims to extract concise abstracts from different types of documents. In the context of this paper, we are interested in summarizing meetings’ data. As the quality of video analyzing’s output highly depends on the type of data, we propose to establish our own framework for this end. The main goal of our study is to use textual data extracted from Automatic Speech Recognition (ASR) transcriptions of the AMI corpus to give a fully unsupervised summarized version of meeting sequences. Our contribution, called Weighted Histogram for ASR Transcriptions (WHASRT), adopts an extractive, free of annotations and dictionary-based app- roach. An exhaustive comparative study demonstrates that our method ensured competitive results with the ranking-based methods. The experimental results showed an enhanced performance over the existing clustering-based methods. Keywords: Summarization · Unsupervised · Transcription · Automatic Speech Recognition · Meetings · Natural Language Processing 1 Introduction Most people spent a lot of their time in meetings. Once they finish, it turns very sig- nificant to produce rendering reports, citing the main issues discussed at the meeting, such as the problems encountered and the decisions made. It is now possible to record and store a meeting even in audio format or video format. Several existing tools, which are embodied in a context known as «Speech-to-text», generate rate text transcriptions listing what has been said during the meeting period. An important issue is then to be able to extract automatically from these textual transcriptions, often very noisy, topics and summaries leading to the creation of the meetings reports’. In this paper, we furnish a fully non-supervised extractive text summarization system and we check its effects on © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 396–405, 2021. https://doi.org/10.1007/978-3-030-49342-4_38