Steganographic Wideband Telephony Using Narrowband Speech Codecs Peter Vary and Bernd Geiser Institute of Communication Systems and Data Processing ( ) RWTH Aachen University, Germany {vary|geiser}@ind.rwth-aachen.de Abstract— We consider the transmission of wideband speech with a cut-off frequency of fc =7 kHz over a standardized digital narrowband communication link (fc =3.4 kHz). At the receiver, wideband speech is produced by artificial bandwidth extension (BWE). The BWE algorithms can be realized with or without some low bit rate side information. In this paper, we propose to communicate the side information to the receiver via a steganographic channel within the bitstream of the narrowband codec. Hence, the bitstream format is not altered and the bit rate is not increased. The following codecs are considered: μ-law PCM, ADPCM, CS-ACELP, GSM FR, and GSM EFR. I. I NTRODUCTION The transmission of wideband speech with a cut-off fre- quency f c of at least 7 kHz is a highly desirable feature for future speech/audio communication networks. Compared with conventional narrowband telephony (f c = 3.4 kHz), wideband speech offers a significantly increased subjective speech quality and intelligibility as well as a clearly reduced listening effort”. For wideband transmission, suitable dedi- cated speech codecs, such as the ITU-T G.722 or the 3GPP AMR-WB, have been developed in the past. However, the required modifications of networks and protocols turned out to be a major obstacle for the introduction of wideband speech coding in today’s communication networks. A promising approach to resolve this dilemma is the de- ployment of speech bandwidth extension (BWE), a method that (artificially) extends the limited frequency range of nar- rowband speech at the receiving end. The related techniques might, as anticipated in [1], be able to speed up the narrow- to wideband change-over of communication networks. In the first part of this paper (Sec. II), the state-of-the-art in speech bandwidth extension is reviewed briefly. We give examples for BWE algorithms that work without as well as with a certain amount of side information. BWE with side information is closely related to parametric speech coding and is actually an integral component of several codec standards beginning with the very first narrowband GSM Full Rate Codec [2] [3] and continuing with more recent wideband codecs such as the 3GPP Adaptive Multi-Rate Wideband Codec [4] [5] or, more explicitly, the ITU-T Embedded Variable Bit Rate Codec G.729.1 [6] [7]. A much more challenging task in speech BWE is to achieve concise results without transmitting any side information (see, e.g., [8]). This approach requires only modifications at This invited paper has been presented at the 41 st Asilomar Conference on Signals, Systems, and Computers in Pacific Grove, CA, USA, Nov. 2007. the receiving end. The respective algorithms are based on the estimation of parameters of a source model for speech production given the knowledge of the narrowband signal. Unfortunately, their performance is bounded because of an insufficient amount of mutual information between the low and the high frequency subbands (cf. [9]). Yet, a certain, consistent quality improvement is achievable. In this paper, we propose an attractive compromise between wideband speech coding with integrated BWE and purely receiver-based BWE without side information. We show how to improve BWE with a small amount of side information that is embedded into the bitstream of a narrowband codec by steganographic techniques. Hence, the second part of the paper (Sec. III) focuses on steganographic methods for digital speech transmission. The third part (Sec. IV) combines speech steganography with a suitable BWE algorithm to form a transmission system that is backwards compatible w.r.t. legacy narrowband termi- nals and the network itself. The codec’s bitstream format is not altered. In particular, the bit rate is not increased. The modified bitstream can be decoded by a standard narrowband decoder, possibly with a slight quality loss. II. SPEECH BANDWIDTH EXTENSION Methods for extending the acoustic bandwidth of speech signals can be roughly categorized as “Bandwidth Extension with Side Information” and “Bandwidth Extension without Side Information”. Exemplary algorithms for both cases are briefly reviewed below. A. Bandwidth Extension without Side Information Figure 1 depicts a signal flow chart of an exemplary bandwidth extension algorithm [10], [8]. This purely receiver based solution is a “mixture” of pattern recognition, statistical estimation, and speech synthesis. The algorithm exploits the implicit redundancy of the source-filter model of speech. It can be subdivided into two sub-tasks: extension of the spectral envelope by pattern recognition and conditional MMSE estimation extension of the narrowband excitation signal, e.g., by spectral replication of the base band excitation. The narrowband speech is interpolated to 16 kHz and an estimated wideband linear prediction (LP) analysis filter is applied to produce the narrowband excitation. After excitation extension, the exactly inverse LP synthesis filter is applied. Therefore, the output signal contains the original narrowband 1475 978-1-4244-2110-7/08/$25.00 ©2007 IEEE