Multimed Tools Appl https://doi.org/10.1007/s11042-018-6276-y Spoken keyword search system using improved ASR engine and novel template-based keyword scoring Ilyes Rebai 1 · Yassine Ben Ayed 1 · Walid Mahdi 2 Received: 17 November 2017 / Revised: 24 April 2018 / Accepted: 15 June 2018 © Springer Science+Business Media, LLC, part of Springer Nature 2018 Abstract Keyword search for spoken documents has become more and more important nowadays due to the increasing amount of spoken data. The typical system makes use of an Automatic Speech Recognition system (ASR) and information retrieval methods. While a number of studies have been done to get the optimal system performance, KeyWord Search (KWS) systems still suffer from two main drawbacks. First, the system performance depends strongly on the ASR transcripts which are inherently inexact. Due to the speech signal variabilities, ASR systems are far from being powerful. Second, KWS systems make detection decisions based on the lattice-based posterior probability which is incomparable across keywords. In addition, posterior probabilities of true detection usually fall into differ- ent ranges which decrease the spotting performance. This paper considers the problems of ASR transcriptions and keyword detection decision based on posterior probabilities. More specifically, we propose to enhance the ASR transcripts accuracy by introducing a new ASR architecture in which we integrate data augmentation and ensemble learning techniques into a single framework. In addition, we proposed a novel keyword rescoring method that pro- vides scores from a new perspective. Precisely, inspired by template-based KWS approach, scores of similarity between the detected keywords are computed by computing the dis- tance between the acoustic features and are used as new scores for decision. Experiments on French and English datasets show that the proposed KWS system potentially leads to more accurate keyword results than the conventional systems. Ilyes Rebai rebai ilyes@hotmail.fr Yassine Ben Ayed yassine.benayed@isims.usf.tn Walid Mahdi walid.mahdi@isimsf.rnu.tn 1 MIRACL: Multimedia InfoRmation System and Advanced Computing Laboratory, University of Sfax, Sfax, Tunisia 2 College of Computers and Information Technology, Taif University, Taif, Saudi Arabia