A Reason to Optimize Information Processing with a Core Property of Natural Language Anna Maria Di Sciullo Université du Québec à Montréal/Department of Linguistics, Montreal, Canada Di_sciullo.anne-marie@uqam.ca Abstract—We focus on a property of natural language enabling the processing of information conveyed by linguistic expressions: structural asymmetry. We provide evidence that structural asymmetry is a property of argument structure. We focus on Information Retrieval and Question Answering systems and we provide evidence that these systems fail to recover natural language argument structure asymmetrical relations and thus they may fail to retrieve relevant documents from large databases and to provide relevant answers to questions. The processing of the underlying asymmetric relations will contribute to the optimization of Information Retrieval and Question Answering systems. I. INTRODUCTION The goal of Natural Language Processing (NLP) techniques is to build software that can efficiently process linguistic expressions. Some NLP techniques process linguistic expressions in terms of strings of words without taking into consideration the syntax-semantic properties of the structure in which they are part. Such techniques fail to process a core feature of linguistic expressions, namely the fact that linguistic expressions are formed of related constituents in asymmetrical relations. Linguistic expressions cannot be analysed in terms of strings of words. We illustrate this with the following two examples. First, such analyses does not account for the relation between a name and a definite description, which may refer to the same individual. This is the case, for example, of the name Wittgenstein and the definite description the author of the Tractatus Logico Philosophicus refer to the same individual. Second, the relation of a pronoun to its antecedent cannot be based on linear precedence, because the relations between the parts of a linguistic expression play a role in anaphora resolution. For example, in the expression this student of Ludwig’s thinks that he is intelligent, the pronoun he cannot take Ludwig as its antecedent, even though it is the closest possible string-linear antecedent. The antecedent of the pronoun he may only be the whole nominal constituent: this student of Ludwig’s. These examples illustrate that the form and interpretation of linguistic expressions is based on properties of relations rather than on properties of strings. Properties of relations are central in several language- related areas, including theoretical and computational linguistics. Strong hypotheses on the asymmetric (irreversible) properties of linguistic relations are central in grammar (Chomsky [1], [2], Di Sciullo and Williams [3], Kayne [4], [5], Moro [6], [7], Hale and Keyser [8], van der Hulst and Ritter [9], [10], Raimy [11], [12], Roeper [28]). The recognition of the core role of asymmetric relations in grammar has led to the elaboration of a model where the primitives are minimal asymmetric relations (Di Sciullo [13], [14], [15]). NLP processing oriented by the recovery of asymmetric relations may lead to the development of efficient software attuned which the properties of human cognitive processing. In this paper, we focus on the recovery of the argument structure, i.e., the relations between the arguments of a predicate, and we show that natural language technologies must access this information in order to be efficient. Given that the relations between the arguments of a predicate are asymmetric, an efficient NLP system should be oriented by the recovery of argument structure asymmetries. The organization of this paper is the following. First, we define the notion of asymmetry. Second, we illustrate argument structure asymmetry. Third, we show that the recovery of argument structure asymmetry in Information Retrieval and Question Answering is crucial in efficient information processing. Finally, a broader consequence is drawn for information processing. II. ASYMMETRY Asymmetry is a property of relations in a set such that there are no ordered pairs in that set whose members are inverted, see (1). Linguistic expressions can be represented in terms of oriented graphs, where asymmetric relations are defined in terms of ‘precede’, ‘dominate’, and ‘asymmetric c-command’, see (1). These properties are core properties of linguistic relations at play across the board in grammar, as well as in argument structure identification, binding, and agreement relations. For example, in the tree in (2), X asymmetrically c- commands Y. (1) a. If R A X A, then R is symmetric iff (x y) (<x, y> R <y, x> R). b. If R A X A, then R is asymmetric iff (x y) (<x, y> R <y, x> R). (Wall [16]) c. C-command: X c-commands Y iff X and Y are categories and X excludes Y, and every category that dominates X dominates Y. (Kayne [4]) d. Asymmetric c-command: X asymmetrically c- commands Y, if X c-commands Y and Y does not c-command X. (Kayne [4])