Multimed Tools Appl
DOI 10.1007/s11042-016-4315-0
Efficient audio-driven multimedia indexing through
similarity-based speech / music discrimination
Nikolaos Tsipas
1
· Lazaros Vrysis
1
·
Charalampos Dimoulas
1
·
George Papanikolaou
1
Received: 31 July 2016 / Revised: 13 December 2016 / Accepted: 26 December 2016
© Springer Science+Business Media New York 2017
Abstract In this paper, an audio-driven algorithm for the detection of speech and music
events in multimedia content is introduced. The proposed approach is based on the hypoth-
esis that short-time frame-level discrimination performance can be enhanced by identifying
transition points between longer, semantically homogeneous segments of audio. In this con-
text, a two-step segmentation approach is employed in order to initially identify transition
points between the homogeneous regions and subsequently classify the derived segments
using a supervised binary classifier. The transition point detection mechanism is based on
the analysis and composition of multiple self-similarity matrices, generated using differ-
ent audio feature sets. The implemented technique aims at discriminating events focusing
on transition point detection with high temporal resolution, a target that is also reflected
in the adopted assessment methodology. Thereafter, multimedia indexing can be efficiently
deployed (for both audio and video sequences), incorporating the processes of high reso-
lution temporal segmentation and semantic annotation extraction. The system is evaluated
against three publicly available datasets and experimental results are presented in compari-
son with existing implementations. The proposed algorithm is provided as an open source
software package in order to support reproducible research and encourage collaboration in
the field.
Nikolaos Tsipas
nitsipas@auth.gr
Lazaros Vrysis
lvrysis@auth.gr
Charalampos Dimoulas
babis@eng.auth.gr
George Papanikolaou
pap@eng.auth.gr
1
Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece