Tandeming Analysis of Perceptual Pre-weighting and Post-weighting Multimode Tree Coder Ying-Yi Li, Pravin Ramadas, and Jerry D. Gibson Department of Electrical and Computer Engineering, University of California, Santa Barbara, USA Email: {yingyi li, pravin ramadas}@umail.ucsb.edu, gibson@ece.ucsb.edu Abstract—The perceptual pre-weighting and post-weighting Multimode Tree Coder is low delay and low complexity. Since the tandem connection of different codecs in voice calls is common today, it is also important to assess any loss in end-to-end speech quality caused by asynchronous tandem coding. We evaluate the tandeming performance of our Multimode Tree Coder when tandemed with itself, with G.727, and with the AMR-NB codec. The results show that the tandem performance of the Multimode Tree Coder is comparable to the AMR-NB coder at 12.2 kbps. I. I NTRODUCTION A low delay, low complexity, and low bit-rate speech coder would be attractive for Voice over IP (VoIP) and Voice over Wireless LAN (VoWLAN) applications. To address these applications, we have proposed a phonetically switched Mul- timode Tree Coder (MMT) with the G.727 backward adaptive code generator that exhibits these characteristics [1]. Although it is not well known, the tandem connection of different codecs in voice calls is common today. For example, a mobile to mobile digital cellular call connected through a wireline VoIP connection often involves 3 different speech codecs, a different codec for each mobile and a different codec in the VoIP backbone. The coded speech thus has to be transcoded (decoded and re-encoded) at each network interface. These transcoding operations between codecs, called asynchronous tandeming of codecs, results in increased latency as well as performance degradation. A low delay codec helps to reduce the delay but it is important to assess any loss in end-to-end speech quality caused by asynchronous tandem coding. The Multimode Tree Coder is based on Multimode classi- fication and Tree coding [1], [2]. Multimode coding is based on phonetic classification of speech. The speech is classified into five modes and each mode is coded with a suitable bit- rate. Tree coding is an encoding procedure where speech samples are coded effectively based on the best long term tree-structured fit to the input waveform [3], [4]. In order to reduce the computational complexity of the perceptual distortion calculation in the Tree Search, we introduced pre- weighting and post-weighting filters in our Multimode Tree Coder in [1]. Since the Multimode Tree Coder [1] is low delay and low complexity, it helps to reduce the delay of transcoding opera- tions. However, the speech quality of asynchronous tandem This research has been supported by NSF Grant Nos. CCF-0728646 and CCF-0917230. Tree Coder Distortion Calculation + W(z) s(k) s’(k) - + G.727 Code Generator M-L Tree Search Symbol Release Rule Path maps Fig. 1. Tree Coder without pre- and post-weighting coding is also an important issue. Therefore, we compare the tandeming performance of the Multimode Tree Coder with G.727 and AMR-NB. The tandeming performance is evaluated by PESQ [5], an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. The results show that the tandeming performance of the Multimode Tree Coder is comparable to AMR-NB 12.2 kbps codec for both clean and noisy sequences. The paper is organized as follows. Section II describes tree coding basics. Section III discusses the details of the Multimode Tree Coder with perceptual pre-weighting and post-weighting. The tandeming performance of the speech codecs is compared in Section IV. Finally, conclusions are presented in Section V. II. TREE CODING A Tree coder has a Code Generator, a Tree Search algo- rithm, a distortion measure and a path map symbol release rule as shown in Fig. 1. The Tree Search algorithm, in combination with the Code Generator and appropriate distortion measure, chooses the best candidate path to encode the current input sample. The symbol release rule decides the symbols on the best path to encode. For simplicity, we used G.727 as our Code Generator since it is a low delay and low complexity ADPCM coder. The coding bit-rate is controlled by the result of the mode decision. In order to reduce the computational complexity, we used the M-L Tree Search as Tree Search algorithm. The M paths with minimum cumulative distortion are chosen and extended along their children. The distortion of each path is calculated with a perceptual weighting filter, which helps to choose a path where the noise is masked by