Steganographic Wideband Telephony
Using Narrowband Speech Codecs
Peter Vary and Bernd Geiser
Institute of Communication Systems and Data Processing ( )
RWTH Aachen University, Germany
{vary|geiser}@ind.rwth-aachen.de
Abstract— We consider the transmission of wideband speech
with a cut-off frequency of fc =7 kHz over a standardized
digital narrowband communication link (fc =3.4 kHz). At the
receiver, wideband speech is produced by artificial bandwidth
extension (BWE). The BWE algorithms can be realized with
or without some low bit rate side information. In this paper, we
propose to communicate the side information to the receiver via a
steganographic channel within the bitstream of the narrowband
codec. Hence, the bitstream format is not altered and the bit
rate is not increased. The following codecs are considered: μ-law
PCM, ADPCM, CS-ACELP, GSM FR, and GSM EFR.
I. I NTRODUCTION
The transmission of wideband speech with a cut-off fre-
quency f
c
of at least 7 kHz is a highly desirable feature
for future speech/audio communication networks. Compared
with conventional narrowband telephony (f
c
= 3.4 kHz),
wideband speech offers a significantly increased subjective
speech quality and intelligibility as well as a clearly reduced
“listening effort”. For wideband transmission, suitable dedi-
cated speech codecs, such as the ITU-T G.722 or the 3GPP
AMR-WB, have been developed in the past. However, the
required modifications of networks and protocols turned out to
be a major obstacle for the introduction of wideband speech
coding in today’s communication networks.
A promising approach to resolve this dilemma is the de-
ployment of speech bandwidth extension (BWE), a method
that (artificially) extends the limited frequency range of nar-
rowband speech at the receiving end. The related techniques
might, as anticipated in [1], be able to speed up the narrow-
to wideband change-over of communication networks. In the
first part of this paper (Sec. II), the state-of-the-art in speech
bandwidth extension is reviewed briefly. We give examples for
BWE algorithms that work without as well as with a certain
amount of side information. BWE with side information is
closely related to parametric speech coding and is actually
an integral component of several codec standards beginning
with the very first narrowband GSM Full Rate Codec [2] [3]
and continuing with more recent wideband codecs such as
the 3GPP Adaptive Multi-Rate Wideband Codec [4] [5] or,
more explicitly, the ITU-T Embedded Variable Bit Rate Codec
G.729.1 [6] [7].
A much more challenging task in speech BWE is to achieve
concise results without transmitting any side information
(see, e.g., [8]). This approach requires only modifications at
This invited paper has been presented at the 41
st
Asilomar Conference on
Signals, Systems, and Computers in Pacific Grove, CA, USA, Nov. 2007.
the receiving end. The respective algorithms are based on
the estimation of parameters of a source model for speech
production given the knowledge of the narrowband signal.
Unfortunately, their performance is bounded because of an
insufficient amount of mutual information between the low and
the high frequency subbands (cf. [9]). Yet, a certain, consistent
quality improvement is achievable.
In this paper, we propose an attractive compromise between
wideband speech coding with integrated BWE and purely
receiver-based BWE without side information. We show how
to improve BWE with a small amount of side information
that is embedded into the bitstream of a narrowband codec
by steganographic techniques. Hence, the second part of the
paper (Sec. III) focuses on steganographic methods for digital
speech transmission.
The third part (Sec. IV) combines speech steganography
with a suitable BWE algorithm to form a transmission system
that is backwards compatible w.r.t. legacy narrowband termi-
nals and the network itself. The codec’s bitstream format is
not altered. In particular, the bit rate is not increased. The
modified bitstream can be decoded by a standard narrowband
decoder, possibly with a slight quality loss.
II. SPEECH BANDWIDTH EXTENSION
Methods for extending the acoustic bandwidth of speech
signals can be roughly categorized as “Bandwidth Extension
with Side Information” and “Bandwidth Extension without
Side Information”. Exemplary algorithms for both cases are
briefly reviewed below.
A. Bandwidth Extension without Side Information
Figure 1 depicts a signal flow chart of an exemplary
bandwidth extension algorithm [10], [8]. This purely receiver
based solution is a “mixture” of pattern recognition, statistical
estimation, and speech synthesis. The algorithm exploits the
implicit redundancy of the source-filter model of speech. It
can be subdivided into two sub-tasks:
• extension of the spectral envelope by pattern recognition
and conditional MMSE estimation
• extension of the narrowband excitation signal, e.g., by
spectral replication of the base band excitation.
The narrowband speech is interpolated to 16 kHz and an
estimated wideband linear prediction (LP) analysis filter is
applied to produce the narrowband excitation. After excitation
extension, the exactly inverse LP synthesis filter is applied.
Therefore, the output signal contains the original narrowband
1475 978-1-4244-2110-7/08/$25.00 ©2007 IEEE