Impossibility of Unambiguous Communication as a Source of Failure in AI Systems William J. Howe 1 , Roman V. Yampolskiy 2 1 Johns Hopkins University 2 University of Louisville whowe1@jhu.edu, roman.yampolskiy@louisville.edu Abstract Ambiguity is pervasive at multiple levels of linguis- tic analysis effectively making unambiguous com- munication impossible. As a consequence, natural language processing systems without true natural language understanding can be easily ”fooled” by ambiguity, but crucially, AI also may use ambiguity to fool its users. Ambiguity impedes communica- tion among humans, and thus also has the potential to be a source of failure in AI systems. 1 1 Introduction The human language faculty allows any given speaker to ”make infinite use of finite means” [Chomsky, 2006]. This is to say that the set of all possible sentences is infinite while the set of words which make them up is finite. However, ambiguity – the existence of more than one interpretation of an expression, is rampant in natural language [Wasow et al., 2005]. It is not clear why ambiguity exists at all in natural language. Given that it impedes communication, one might assume languages would evolve to avoid it, yet this is not ob- served [Wasow et al., 2005]. One explanation is that mapping a word to multiple meanings saves memory. Another account asserts that ambiguity is a consequence of a human bias to- ward shorter morphemes [Wasow et al., 2005]. Yet another account construes ambiguity as a product of optimization to- wards efficiency (principle of least-effort) over the course of language evolution. On this view, ambiguity is the price paid for a least effort language [Sol´ e and Seoane, 2015]. In this paper, we won’t seek to explain the root cause of ambiguity, but rather to show how it can pose a problem for AI systems. First we’ll identify types of ambiguity which occur at the lev- els of phonology, syntax, and semantics, noting how mod- ern natural language processing (NLP) systems disambiguate ambiguous input. Finally, we’ll consider how more advanced AI could exploit ambiguity and how bad actors might utilize such systems to their ends. 1 Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 Phonology Computational phonology is a core component of speech- based NLP systems. The ultimate goal of automatic speech recognition is to take an acoustic waveform as input and de- code it into a string of words as text [Jurafsky, 2000]. The field which for several years was dominated by the Gaus- sian Mixture Model - Hidden Markov Model (GMM-HMM) framework has now made significant advancements using deep neural network (DNN) architectures to enable technolo- gies like Siri, Alexa, and Google Assistant [Yu and Deng, 2016]. In particular recurrent neural networks which cap- ture the “dynamic temporal behavior” of sequence data that DNN-HMM architectures do not capture, have proven very effective [Yu and Deng, 2016]. Despite these advances, au- tomatic speech recognition (ASR) still performs poorly with far field microphones, noisy conditions, accented speech, and multitalker speech [Yu and Deng, 2016]. To see why ambi- guity poses such a problem for these models, we’ll consider a architecture which uses some statistical technique to rec- ognize speech units along with some language model over some dictionary to find the highest probability sequence of speech units [Jurafsky, 2000]. It is clear that because such a model is probabilistic, it lacks true natural language under- standing – this means the model can fail when faced with a speech waveform that might be unlikely or low probability. It may favor the more likely incorrect output over the less likely yet correct target output. Because humans possess lin- guistic creativity – the ability to produce never before seen utterances which a model might consider highly improbable, current ASR systems have an inherent deficit. One way to remedy this is to filter out hypotheses that don’t make sense with, “[a] speech recognition system augmented with Com- monsense Knowledge [that] can spot its own nonsensical er- rors, and proactively correct them” [Lieberman et al., 2005; Liu et al., 2016]. Nevertheless, brittle ASR systems, “may misinterpret commands due to coarticulation, segmentation, homophones, or double meanings in the human language” [Yampolskiy, 2016]. 2.1 Homophones Homophones – sets of words which sound the same but have different meanings, are a classic case of phonological ambi- guity. The following data present utterances which could be misinterpreted by an ASR system but which are easily disam-