Multimed Tools Appl
https://doi.org/10.1007/s11042-018-6276-y
Spoken keyword search system using improved ASR
engine and novel template-based keyword scoring
Ilyes Rebai
1
· Yassine Ben Ayed
1
· Walid Mahdi
2
Received: 17 November 2017 / Revised: 24 April 2018 / Accepted: 15 June 2018
© Springer Science+Business Media, LLC, part of Springer Nature 2018
Abstract Keyword search for spoken documents has become more and more important
nowadays due to the increasing amount of spoken data. The typical system makes use of
an Automatic Speech Recognition system (ASR) and information retrieval methods. While
a number of studies have been done to get the optimal system performance, KeyWord
Search (KWS) systems still suffer from two main drawbacks. First, the system performance
depends strongly on the ASR transcripts which are inherently inexact. Due to the speech
signal variabilities, ASR systems are far from being powerful. Second, KWS systems make
detection decisions based on the lattice-based posterior probability which is incomparable
across keywords. In addition, posterior probabilities of true detection usually fall into differ-
ent ranges which decrease the spotting performance. This paper considers the problems of
ASR transcriptions and keyword detection decision based on posterior probabilities. More
specifically, we propose to enhance the ASR transcripts accuracy by introducing a new ASR
architecture in which we integrate data augmentation and ensemble learning techniques into
a single framework. In addition, we proposed a novel keyword rescoring method that pro-
vides scores from a new perspective. Precisely, inspired by template-based KWS approach,
scores of similarity between the detected keywords are computed by computing the dis-
tance between the acoustic features and are used as new scores for decision. Experiments on
French and English datasets show that the proposed KWS system potentially leads to more
accurate keyword results than the conventional systems.
Ilyes Rebai
rebai ilyes@hotmail.fr
Yassine Ben Ayed
yassine.benayed@isims.usf.tn
Walid Mahdi
walid.mahdi@isimsf.rnu.tn
1
MIRACL: Multimedia InfoRmation System and Advanced Computing Laboratory,
University of Sfax, Sfax, Tunisia
2
College of Computers and Information Technology, Taif University, Taif, Saudi Arabia