Learning to Disambiguate Relative Pronouns Claire Cardie Department of Computer Science University of Massachusetts Amherst, MA 01003 (cardie@cs.umass.edu) Abstract In this paper we show how a natural language system can learn to find the antecedents of relative pronouns. We use a well-known conceptual clustering system to create a case-based memory that predicts the antecedent of a wh-word given a description of the clause that precedes it. Our automated approach duplicates the performance of hand-coded rules. In addition, it requires only minimal syntactic parsing capabilities and a very general semantic feature set for describing nouns. Human intervention is needed only during the training phase. Thus, it is possible to compile relative pronoun disambiguation heuristics tuned to the syntactic and semantic preferences of a new domain with relative ease. Moreover, we believe that the technique provides a general approach for the automated acquisition of additional disambiguation heuristics for natural language systems, especially for problems that require the assimilation of syntactic and semantic knowledge. Introduction Relative clauses consistently create problems for language processing systems. Consider, for example, the sentence in Figure 1. A correct semantic interpretation should include the fact that “the boy” is the actor of “won” even though Tony saw the boy who won the award. Figure 1 : Understanding Relative Clauses the phrase does not appear in the embedded clause. The interpretation of a relative clause, however, depends on the accurate resolution of two ambiguities, each of which must be performed over a potentially unbounded distance. The system has to 1) find the antecedent of the relative pronoun and 2) determine the antecedent’s implicit position in the embedded clause. The work we describe here focuses on (1): locating the antecedent of the relative pronoun.Indeed, although relative pronoun disambiguation seems a simple enough task, there are many factors that make it difficult 1 : The head of the antecedent of a relative pronoun does not appear in a consistent position or syntactic constituent. In both S1 and S2 of Figure 2, for example, the antecedent is “the boy.” In S1, however, “the boy” is the direct object of the preceding clause, while in S2 it appears as the subject of the preceding clause. On the other hand, the head of the antecedent is the phrase that immediately precedes “who” in both cases. S3, however, shows that this is not always the case. In fact, the antecedent head may be very distant from its coreferent wh-word 2 (e.g., S4). S1. Tony saw the boy who won the award. S2. The boy who gave me the book had red hair. S3. Tony ate dinner with the men from Detroit who sold computers. S4. I spoke to the woman with the black shirt and green hat over in the far corner of the room who wanted a second interview. S5. I'd like to thank Jim, Terry, and Shawn, who provided the desserts. S6. I'd like to thank our sponsors, GE and NSF, who provide financial support. S7. We wondered who stole the watch. S8. We talked with the woman and the man who were/was dancing. S9. We talked with the woman and the man who danced. S10. The woman from Philadelphia who played soccer was my sister. S11. The awards for the children who pass the test are in the drawer. Figure 2 : Relative Pronoun Antecedents 1 Locating the gap is a separate, but equally difficult problem because the gap may appear in a variety of positions in the embedded clause: the subject, direct object, indirect object, or object of a preposition. For a simple solution to the gap-finding problem that is consistent with the work presented here, see (Cardie & Lehnert, 1991). 2 Relative pronouns like who, whom, which, that, where, etc. are often referred to as wh-words.