Using the Speech Transmission Index for predicting non-native
speech intelligibility
Sander J. van Wijngaarden,
a)
Adelbert W. Bronkhorst, Tammo Houtgast, and
Herman J. M. Steeneken
TNO Human Factors, PO Box 23, 3769 ZG Soesterberg, The Netherlands
Received 5 March 2003; revised 10 February 2003; accepted 15 December 2003
While the Speech Transmission Index STI is widely applied for prediction of speech intelligibility
in room acoustics and telecommunication engineering, it is unclear how to interpret STI values
when non-native talkers or listeners are involved. Based on subjectively measured psychometric
functions for sentence intelligibility in noise, for populations of native and non-native
communicators, a correction function for the interpretation of the STI is derived. This function is
applied to determine the appropriate STI ranges with qualification labels ‘‘bad’’–‘‘excellent’’, for
specific populations of non-natives. The correction function is derived by relating the non-native
psychometric function to the native psychometric function by a single parameter . For listeners,
the parameter is found to be highly correlated with linguistic entropy. It is shown that the proposed
correction function is also valid for conditions featuring bandwidth limiting and reverberation.
© 2004 Acoustical Society of America. DOI: 10.1121/1.1647145
PACS numbers: 43.70.Kv, 43.71.Hw, 43.71.Gv KWG Pages: 1281–1291
I. INTRODUCTION
The intelligibility of speech is generally considered to
depend on the characteristics of the talker and the listener,
the complexity of the spoken messages, and the characteris-
tics of the communication channel. Objective speech intelli-
gibility predictions models have been shown to accurately
predict the influence of the communication channel charac-
teristics on speech intelligibility. An example of such a
model is the Articulation Index AI model French and
Steinberg, 1947; Kryter, 1962, and more advanced models
based on the AI, such as the Speech Intelligibility Index SII;
ANSI, 1997 and the Speech Transmission Index STI; IEC,
1998; Steeneken and Houtgast, 1980; Steeneken and Hout-
gast, 1999.
In some cases, the overall speech intelligibility that is
experienced is clearly affected by factors other than the
physical characteristics of the channel. Individual talker dif-
ferences Bradlow et al. 1996; Hood and Poole, 1980 and
message complexity Pollack, 1964 were already men-
tioned. Other examples are individual differences in speaking
style Picheny et al. 1985 and hearing loss Plomp, 1978.
An important determining factor for speech intelligibil-
ity is language proficiency, of talkers van Wijngaarden
et al., 2002a as well as listeners van Wijngaarden et al.,
2002b. Learning a language at a later age results in a certain
degree of limitation to language proficiency Flege, 1995.
So-called non-native speech communication is practically al-
ways less effective than native communication. The intelli-
gibility effects of non-native speech production and non-
native perception show an interaction with speech
transmission quality the quality of the channel. Speech de-
grading influences such as noise Buus et al., 1986; Floren-
tine et al., 1984; Florentine, 1985 and reverberation Na
´
-
belek and Donahue, 1984 aggravate the intelligibility effects
of non-native speech communication.
For various applications, it would be very useful to have
an objective, quantitative intelligibility prediction method
that is capable of dealing with non-native speech. In Sec. II
of this article, the suitability of existing objective speech
intelligibility prediction models for non-native applications
is discussed.
Section III continues by proposing a way in which the
Speech Transmission Index STI can be used in various
non-native scenarios. Section IV contains a validation of this
approach for speech in noise, bandwidth limiting, and rever-
beration.
II. SUITABILITY OF OBJECTIVE INTELLIGIBILITY
PREDICTION MODELS FOR NON-NATIVE SPEECH
A. Speech transmission quality versus speech
intelligibility
Speech intelligibility can be thought of as the success
that a source and a receiver talker and listener have in
transmitting information over a channel. Each unique talker–
listener pair has a certain potential for transmitting messages
of a given complexity. The quality of the transmission chan-
nel determines how much of this potential is realized. A typi-
cal transmission channel could be a phone line, a public ad-
dress system, or the acoustic environment of a specific room.
Objective prediction models are especially good in
quantifying speech transmission quality. The influence of
factors determining speech intelligibility related to talkers
and listeners, rather than the channel, has been incorporated
to a lesser degree. A proficiency factor has been proposed
Pavlovic and Studebaker, 1984 for incorporating talker-
and listener-specific factors into the framework of the articu-
lation index, but this has not been developed to a level where
practically useful predictions can be obtained.
a
Electronic mail: vanwijngaarden@tm.tno.nl
1281 J. Acoust. Soc. Am. 115 (3), March 2004 0001-4966/2004/115(3)/1281/11/$20.00 © 2004 Acoustical Society of America