Reading Between the Lines ∗ Loizos Michael Department of Computer Science University of Cyprus loizosm@cs.ucy.ac.cy Abstract Reading involves, among others, identifying what is implied but not expressed in text. This task, known as textual entailment, offers a natural abstraction for many NLP tasks, and has been recognized as a central tool for the new area of Machine Reading. Important in the study of textual entailment is mak- ing precise the sense in which something is implied by text. The operational deﬁnition often employed is a subjective one: something is implied if humans are more likely to believe it given the truth of the text, than otherwise. In this work we propose a nat- ural objective deﬁnition for textual entailment. Our approach is to view text as a partial depiction of some underlying hidden reality. Reality is mapped into text through a possibly stochastic process, the author of the text. Textual entailment is then for- malized as the task of accurately, in a deﬁned sense, recovering information about this hidden reality. We show how existing machine learning work can be applied to this information recovery setting, and discuss the implications for the construction of ma- chines that autonomously engage in textual entail- ment. We then investigate the role of using multiple inference rules for this task. We establish that such rules cannot be learned and applied in parallel, but that layered learning and reasoning are necessary. 1 Introduction Text understanding has long been considered one of the cen- tral aspects of intelligent behavior, and one that has received a lot of attention within the Artiﬁcial Intelligence community. Many aspects of this problem have been considered and ex- tensively studied, and frameworks have been developed for tasks such as summarization, question answering, syntactic and semantic tagging. The importance of text understanding has greatly increased over the past few years, following the recognition that the web offers an abundant source of human knowledge encoded in text, on which machines can capital- ize (see, e.g., Reading the Web [Mitchell, 2005]). It has also been suggested that a robust and viable way for machines to * This work was supported in part by grant NSF-CCF-04-27129. acquire commonsense knowledge, similar to that employed by humans, is through learning from natural language text (see, e.g., Knowledge Infusion [Valiant, 2006]). A new area has, in fact, emerged with the goal of extracting knowledge from text, dubbed Machine Reading [Etzioni et al., 2006]. Traditional Natural Language Processing tasks and tech- niques are useful components of this ambitious goal. Yet, the emphasis shifts from extracting knowledge encoded within a piece of text, to that of understanding what text implies, even if not explicitly stated. As an example, consider the follow- ing sentence: “Alice held a barbecue party last weekend.”. Traditional NLP tasks include recognizing the entities, tag- ging words with their part of speech, creating the syntactic tree, identifying the verbs and their arguments, and so on. Beyond these tasks, however, one may also ask what can be inferred from this sentence. Although this question might not admit a unique answer, a possible inference might be that the weather was good last weekend. In fact, the author of the sen- tence may be aware, or even take for granted, that readers will make such an inference, and she may choose not to explicitly include this information. If machines are to understand the intended meaning of text, they should be able to draw similar inferences as those (expected to be) drawn by human readers. The inference task can be seen as one of deciding whether the truth of a statement follows from the truth of some piece of text and some background knowledge. This task, known as textual entailment, has recently received considerable at- tention, since it naturally generalizes and abstracts many of the traditional NLP tasks [Dagan et al., 2005]. Amongst the most successful approaches for this task is one that employs knowledge induced from a large corpus [Hickl et al., 2006]. The ultimate goal, of course, is to have machines that com- pletely autonomously acquire relevant background knowl- edge and subsequently use it to recognize textual entailment. Designing and implementing such machines would arguably be a concrete step forward in endowing machines with the ability to understand text by drawing those commonsense in- ferences that humans do when reading text. For this to hap- pen, a crisp deﬁnition of textual entailment is ﬁrst needed. The classical deﬁnition follows the semantics of logical im- plication (i.e., a possible worlds interpretation of entailment). This deﬁnition is, though, too rigid to be useful in practice. Instead, a more applied (operational) deﬁnition of textual en- tailment is used: “[a piece of text] entails [a statement] if the meaning of [the statement] can be inferred from the mean- 1525