Impossibility of Unambiguous Communication as a Source of Failure in AI
Systems
William J. Howe
1
, Roman V. Yampolskiy
2
1
Johns Hopkins University
2
University of Louisville
whowe1@jhu.edu, roman.yampolskiy@louisville.edu
Abstract
Ambiguity is pervasive at multiple levels of linguis-
tic analysis effectively making unambiguous com-
munication impossible. As a consequence, natural
language processing systems without true natural
language understanding can be easily ”fooled” by
ambiguity, but crucially, AI also may use ambiguity
to fool its users. Ambiguity impedes communica-
tion among humans, and thus also has the potential
to be a source of failure in AI systems.
1
1 Introduction
The human language faculty allows any given speaker to
”make infinite use of finite means” [Chomsky, 2006]. This is
to say that the set of all possible sentences is infinite while
the set of words which make them up is finite. However,
ambiguity – the existence of more than one interpretation of
an expression, is rampant in natural language [Wasow et al.,
2005]. It is not clear why ambiguity exists at all in natural
language. Given that it impedes communication, one might
assume languages would evolve to avoid it, yet this is not ob-
served [Wasow et al., 2005]. One explanation is that mapping
a word to multiple meanings saves memory. Another account
asserts that ambiguity is a consequence of a human bias to-
ward shorter morphemes [Wasow et al., 2005]. Yet another
account construes ambiguity as a product of optimization to-
wards efficiency (principle of least-effort) over the course of
language evolution. On this view, ambiguity is the price paid
for a least effort language [Sol´ e and Seoane, 2015]. In this
paper, we won’t seek to explain the root cause of ambiguity,
but rather to show how it can pose a problem for AI systems.
First we’ll identify types of ambiguity which occur at the lev-
els of phonology, syntax, and semantics, noting how mod-
ern natural language processing (NLP) systems disambiguate
ambiguous input. Finally, we’ll consider how more advanced
AI could exploit ambiguity and how bad actors might utilize
such systems to their ends.
1
Copyright © 2021 for this paper by its authors. Use permitted
under Creative Commons License Attribution 4.0 International (CC
BY 4.0).
2 Phonology
Computational phonology is a core component of speech-
based NLP systems. The ultimate goal of automatic speech
recognition is to take an acoustic waveform as input and de-
code it into a string of words as text [Jurafsky, 2000]. The
field which for several years was dominated by the Gaus-
sian Mixture Model - Hidden Markov Model (GMM-HMM)
framework has now made significant advancements using
deep neural network (DNN) architectures to enable technolo-
gies like Siri, Alexa, and Google Assistant [Yu and Deng,
2016]. In particular recurrent neural networks which cap-
ture the “dynamic temporal behavior” of sequence data that
DNN-HMM architectures do not capture, have proven very
effective [Yu and Deng, 2016]. Despite these advances, au-
tomatic speech recognition (ASR) still performs poorly with
far field microphones, noisy conditions, accented speech, and
multitalker speech [Yu and Deng, 2016]. To see why ambi-
guity poses such a problem for these models, we’ll consider
a architecture which uses some statistical technique to rec-
ognize speech units along with some language model over
some dictionary to find the highest probability sequence of
speech units [Jurafsky, 2000]. It is clear that because such a
model is probabilistic, it lacks true natural language under-
standing – this means the model can fail when faced with a
speech waveform that might be unlikely or low probability.
It may favor the more likely incorrect output over the less
likely yet correct target output. Because humans possess lin-
guistic creativity – the ability to produce never before seen
utterances which a model might consider highly improbable,
current ASR systems have an inherent deficit. One way to
remedy this is to filter out hypotheses that don’t make sense
with, “[a] speech recognition system augmented with Com-
monsense Knowledge [that] can spot its own nonsensical er-
rors, and proactively correct them” [Lieberman et al., 2005;
Liu et al., 2016]. Nevertheless, brittle ASR systems, “may
misinterpret commands due to coarticulation, segmentation,
homophones, or double meanings in the human language”
[Yampolskiy, 2016].
2.1 Homophones
Homophones – sets of words which sound the same but have
different meanings, are a classic case of phonological ambi-
guity. The following data present utterances which could be
misinterpreted by an ASR system but which are easily disam-