Generating Anaphora for Simplifying Text In Proceedings of the 4th Discourse Anaphora and Anaphor Resolution Colloquium (DAARC 2002). Pages 199-204 Advaith Siddharthan & Ann Copestake Natural Language and Information Processing Group Computer Laboratory, University of Cambridge as372, aac10 @cl.cam.ac.uk Abstract We present an algorithm for generating referring expressions in open domains. Existing algorithms assume a classification of adjectives which is possible only for restricted domains. Our alternative relies on WordNet synonym and antonym sets and gives equivalent results on the examples cited in the literature and improved results in other cases that prior approaches cannot handle. We believe that it is also the first algorithm that allows for the incremental incorporation of relations. We perform an evaluation on a text-simplification task on Wall Street Journal data. 1. Introduction The automatic dis-embedding of relative clauses is an important aspect of text simplification, an NLP task that aims to rewrite sentences, reducing their complexity while preserving their meaning and information content (Chan- drasekar et al., 1996; Carroll et al., 1998). Text simplifica- tion is a useful NLP task for varied reasons. Chandrasekar et al. (1996) and Chandrasekar and Srinivas (1997) viewed text simplification as a preprocessing tool to improve the performance of their parser. The PSET project (Carroll et al., 1998; Carroll et al., 1999), on the other hand, focused its research on simplifying newspaper text for aphasics, who have trouble with long sentences, infrequent words and complicated grammatical constructs including embed- ded clauses (Devlin, 1999). Consider: A former ceremonial officer from Derby, who was at the heart of Whitehall’s patronage machinery, says there is a general re- view of the state of the honours list every five years or so. This simplifies to (see Devlin (1999) for motivation): A former ceremonial officer from Derby was at the heart of Whitehall’s patronage machinery. This former officer says there is a general review of the state of the honours list every five years or so. We require a referring expression for the noun phrase to which the clause attaches to use as the subject when we dis-embed the clause. In the above example, we need to generate This former officer from A former ceremonial officer from Derby. Reproducing the entire NP can make the text look stilted. Moreover, including too much infor- mation in the referring expression can convey unwanted and possibly wrong conversational implicatures. This is a problem that arises when simplifying other grammatical constructs as well; e.g., separating out conjoined verb phrases or making new sentences out of appositives. In this paper, we present an algorithm for generating referring expressions in open domains. We present it as a general purpose algorithm, though we evaluate it on the text simplification task. 2. Generating Referring Expressions We present our attribute selection algorithm in section 2.1. and extend it to handle relational descriptions in section 2.3. and nominal attributes in section 2.5.. We discuss the issue of forming the contrast set in section 2.2.. 2.1. Attributes The incremental algorithm (Reiter and Dale, 1992) is the most widely discussed attribute selection algorithm. It takes as input the intended referent and a contrast set of distractors (other entities that could be confused with the intended referent). Entities are represented as attribute value matrices (AVMs). The algorithm also takes as in- put a *preferred-attributes* list that contains, in order of preference, the attributes that human writers use to reference objects. For the example in their paper (that deals with entities like the small black dog, the white cat...), the preference might be [colour, size, shape, ...]. The al- gorithm then keeps adding attributes from *preferred- attributes* that rule out at least one entity in the con- trast set to the referring set until all the entities in the con- trast set have been ruled out. It is instructive to look at how the incremental algo- rithm works. Consider an example where a large brown dog needs to be referred to. The contrast set contains a large black dog. These are represented by the AVMs shown below. Assuming that the *preferred-attributes* list is [size, colour, ...], the algorithm would first compare the values of the size attribute (both large), dis- regard that attribute as not being discriminating, compare the values of the colour attribute and return the brown dog. Unfortunately, the incremental algorithm is unsuitable for open domains because it assumes the following: 1. A classification scheme for attributes exists 2. The values that attributes take are mutually exclusive 3. Linguistic realisations of attributes are unambiguous