Research Article A Russian Keyword Spotting System Based on Large Vocabulary Continuous Speech Recognition and Linguistic Knowledge Valentin Smirnov, 1 Dmitry Ignatov, 1 Michael Gusev, 1 Mais Farkhadov, 2 Natalia Rumyantseva, 3 and Mukhabbat Farkhadova 3 1 Speech Drive LLC, Saint Petersburg, Russia 2 V.A. Trapeznikov Institute of Control Sciences of RAS, Moscow, Russia 3 RUDN University, Moscow, Russia Correspondence should be addressed to Mais Farkhadov; mais.farhadov@gmail.com Received 11 July 2016; Revised 27 October 2016; Accepted 14 November 2016 Academic Editor: Alexey Karpov Copyright © 2016 Valentin Smirnov et al. Tis is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Te paper describes the key concepts of a word spotting system for Russian based on large vocabulary continuous speech recognition. Key algorithms and system settings are described, including the pronunciation variation algorithm, and the experimental results on the real-life telecom data are provided. Te description of system architecture and the user interface is provided. Te system is based on CMU Sphinx open-source speech recognition platform and on the linguistic models and algorithms developed by Speech Drive LLC. Te efective combination of baseline statistic methods, real-world training data, and the intensive use of linguistic knowledge led to a quality result applicable to industrial use. 1. Introduction Te need to understand business trends, ensure public secu- rity, and improve the quality of customer service has caused a sustainable development of speech analytics systems which transform speech data into a measurable and searchable index of words, phrases, and paralinguistic markers. Keyword spotting technology makes a substantial part of such systems. Modern keyword spotting engines usually rely on either of three approaches, namely, phonetic lattice search [1, 2], word- based models [3, 4], and large vocabulary speech recognition [5]. While each of the approaches has got its pros and cons [6] the latter starts to be prominent due to public availability of baseline algorithms, cheaper hardware to run intensive calculations required in LVCSR and, most importantly, high- quality results. Most recently a number of innovative approaches to spoken term detection were ofered such as various recogni- tion system combination and score normalization, reporting 20% increase in spoken term detection quality (measured as ATWV) [7, 8]. Deep neural networks application in LVCSR is starting to achieve wide adoption [9]. Tanks to the IARPA Babel program aimed at building systems that can be rapidly applied to any human language in order to provide efective search capability for analysts to efciently process massive amounts of real-world recorded speech [10] in recent years wide research has been held to develop technologies for spoken term detection systems for low- resource languages. For example, [11] describes an approach for keyword spotting in Cantonese based on large vocabulary speech recognition and shows positive results of applying neural networks to recognition lattice rescoring. Reference [12] provides an extensive description of modern methods used to build a keyword spotting system for 10 low-resource languages with primary focus on Assamese, Bengali, Haitian Creole, Lao, and Zulu. Deep neural network acoustic models are used both as feature extractor for a GMM-based HMM system and to compute state posteriors and convert them into scaled likelihoods by normalizing by the state priors. Data augmentation via using multilingual bottleneck features is ofered (the topic is also covered in [13]). Finally language independent and unsupervised acoustic models are trained Hindawi Publishing Corporation Journal of Electrical and Computer Engineering Volume 2016, Article ID 4062786, 9 pages http://dx.doi.org/10.1155/2016/4062786