Entropy coding of compressed feature parameters for distributed speech recognition Young Han Lee, Hong Kook Kim * Department of Information and Communications, Gwangju Institute of Science and Technology (GIST), 1 Oryong-dong, Buk-gu, Gwangju 500-712, Republic of Korea Received 11 March 2009; accepted 7 January 2010 Abstract In this paper, we propose several entropy coding methods to further compress quantized mel-frequency cepstral coefficients (MFCCs) used for distributed speech recognition (DSR). As a standard DSR front-end, the European Telecommunications Standards Institute (ETSI) published an extended front-end that includes the split-vector quantization of MFCCs and voicing class information. By explor- ing entropy variances of compressed MFCCs according to the voicing class of the analysis frame and the amount of the entropy due to MFCC subvector indices, voicing class-dependent and subvector-wise Huffman coding methods are proposed. In addition, differential Huffman coding is then applied to further enhance the coding gain against class-dependent and subvector-wise Huffman codings. Sub- sequent experiments show that the average bit-rate of the subvector-wise differential Huffman coding is measured at 33.93 bits/frame, which is the smallest among the proposed Huffman coding methods, whereas that of a traditional Huffman coding that does not consider voicing class and encodes with a single Huffman coding tree for all the subvectors is measured at 42.22 bits/frame for the TIMIT data- base. In addition, we evaluate the performance of the proposed Huffman coding methods applied to speech in noise by using the Aurora 4 database, a standard speech database for DSR. As a result, it is shown that the subvector-wise differential Huffman coding method provides the smallest average bit-rate. Ó 2010 Elsevier B.V. All rights reserved. Keywords: Entropy coding; Distributed speech recognition; Huffman coding; Mel-frequency cepstral coefficient; Voicing class 1. Introduction As technologies associated with wireless network sys- tems have advanced, the demand for an intelligent user interface (IUI) operating on wireless and mobile devices such as smart-phones or personal digital assistants (PDAs) has also dramatically increased. These portable devices are typically small in size and difficult to manipulate, thus cur- rent IUIs available for them are limited. Thus, as a prom- ising user interface to make them easier to use, speech recognition can take the place of the keyboard or touch pad on these devices since voice input only requires a microphone (Srinivasamurthy et al., 2006). A major prob- lem, however, is that the computational complexity of speech recognition is too high for most portable devices. Thus, approaches such as speech-based network speech recognition and distributed speech recognition (DSR), have been developed in attempts to more efficiently imple- ment speech recognition functions on such devices (Kiss, 2000; Gallardo-Antolin et al., 2005; Huerta and Stern, 1998; Kim and Cox, 2001; Raj et al., 2001). Basically, a DSR system splits the functions of speech recognition into a front-end and a back-end, where the for- mer is performed in the portable device and the latter in a designated speech recognition server having high computa- tional power. The primary purpose of the DSR front-end is to extract speech recognition features such as the mel-fre- quency cepstral coefficients (MFCCs) that are commonly 0167-6393/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.specom.2010.01.002 * Corresponding author. Tel.: +82 62 970 2228; fax: +82 62 970 2204. E-mail addresses: cpumaker@gist.ac.kr (Y.H. Lee), hongkook@gist. ac.kr, hkkim@ieee.org (H.K. Kim). www.elsevier.com/locate/specom Available online at www.sciencedirect.com Speech Communication 52 (2010) 405–412