Low Complexity On-Line Adaptation Techniques in Context of Assamese Spoken Query System S. Shahnawazuddin & K. T. Deepak & B. D. Sarma & A. Deka & S. R. M. Prasanna & Rohit Sinha Received: 11 March 2014 /Revised: 28 April 2014 /Accepted: 8 May 2014 # Springer Science+Business Media New York 2014 Abstract In this work, we present the development of an Assamese spoken query (SQ) system for accessing the price of agricultural commodities. The developed system intends to make the cultivators aware of the recent market trends. The SQ system enables the user to access the latest price of the commodity by calling the system using a landline/mobile phone. The spoken query input by the user is processed and then current price of the desired commodity in the given district is played back by the system. Features that make the system user friendly are incorporated into the design after taking feedbacks from local farmers. In other words, the system is tuned as per the needs of the users. Furthermore, the issues of adapting such query systems to the end user are also explored in this work. In case of the developed SQ system, the typical user responses are of extremely small duration (1–2 seconds due to isolated word response from the user). Moreover, the employed adaptation approach must keep the system latency low since these systems are meant for real-time applications. Consequently, adapting such systems to the end user becomes an extremely challenging task. In this regard, acoustic model interpolation based adaptation tech- niques are proposed that employ interpolation weights derived in an approximate fashion. The proposed approaches try to minimize the latency in the system response by avoiding the iterative weight estimation procedure used in the earlier re- ported works. Even with extremely small amount of adapta- tion data, the proposed approaches are found to result in a relative improvement of 12 % over the baseline ASR system. Keywords Automatic speech recognition . Spoken query system . Assamese phone set . On-line adaptation . Fast adaptation . Sparse representation 1 Introduction Adaptation techniques intend to reduce the acoustic mismatch between the speaker-independent (SI) acoustic models and the test data. These methods have become an integral part of the state-of-the-art automatic speech recognition (ASR) systems. Maximum a-posteriori (MAP) [1] and maximum likelihood linear regression (MLLR) [2] criteria form the basis for most of the conventional adaptation techniques. These techniques require a considerably large amount of adaptation data and hence become largely ineffective when available adaptation data is small (≤10 s). A number of fast/rapid adaptation approaches have been proposed over the past decade to ad- dress this problem. These techniques generally use bases model parameter interpolation to derive the model parameters for the test speaker/utterance [3–5]. The interpolation weights are either estimated as a global parameter or a number of Gaussians are tied depending on some criterion (like regres- sion classes) and one set of weights is estimated for each class. Since, only the interpolation weights are estimated in bases interpolation based approaches, even a small amount of adap- tation data is sufficient for the estimation of these interpolation S. Shahnawazuddin (*) : K. T. Deepak : B. D. Sarma : A. Deka : S. R. M. Prasanna : R. Sinha Department of Electronics and Electrical Engineer, Indian Institute of Technology Guwahati, Guwahati 781039, India e-mail: s.syed@iitg.ernet.in K. T. Deepak e-mail: deepakkt@iitg.ernet.in B. D. Sarma e-mail: s.biswajit@iitg.ernet.in A. Deka e-mail: ani.deka@iitg.ernet.in S. R. M. Prasanna e-mail: prasanna@iitg.ernet.in R. Sinha e-mail: rsinha@iitg.ernet.in J Sign Process Syst DOI 10.1007/s11265-014-0906-z