IEICE TRANS. INF. & SYST., VOL.E0–D, NO.0 1917 1 PAPER Special Issue on Text Processing for Information Access Effects of Structural Matching and Paraphrasing in Question Answering Tetsuro TAKAHASHI , Kozo NAWATA , Kentaro INUI , and Yuji MATSUMOTO , Nonmembers SUMMARY In this paper, we propose an answer seeking algorithm for question answering that integrates structural matching and paraphrasing, and report the results of our empirical evaluation conducted with the aim of examining effects of incorporating those two components. According to the results, the contribution of structural matching and paraphrasing was not so large as expected. Based on error analysis, we conclude that structural matching-based approaches to answer seeking require technologies for (a) coreference resolution, (b) processing of parse forests instead of parse trees, and (c) large-scale acquisition of paraphrase patterns. key words: question answering, structural matching, paraphrasing, para- phrase space 1. Introduction Question answering is a specific task of language under- standing, which may act as a good benchmark to approach deep processing toward language understanding. A tempt- ing but probably hasty approach would be to attempt fully conceptual matching between questions and documents. Such a system would derive conceptually represented in- formation from question and target documents and analyze them to infer the answer. Such an approach would, how- ever, entail obvious problems: above all, (a) the overhead of the development and maintenance of the conceptual repre- sentation for open-domain natural language documents, and (b) the lack of robustness of state-of-the-art language under- standing technologies. Given this background, it is worthwhile to seek a compromise between fully conceptual and shallow bag-of- words matching. A feasible option is structural matching at the level of syntactic structures (or dependency structures). The previously proposed methods being concerned, most of them rely on a scoring function based on bag-of-wards sim- ilarity or string matching, whereas one can find only a very limited number of attempts in which structural information is intensively used for matching [7], [8], [11]. Furthermore, even in the latter exceptional attempts, effects of applying structural matching to answer seeking have never been em- pirically evaluated. Considering this context, in this paper, we discuss the potentialities of structural matching for ques- tion answering focusing the following issues. For question answering, strict structural matching is not adequate because a given question is unlikely to be structurally identical with a passage that includes Manuscript received November 30, 2002. Manuscript revised March 7, 2003. Final manuscript received May 2, 2003. Nara Institute of Science and Technology an answer candidate (simply passage, hereafter). We thus need to seek a method of soft matching — more specifically, a method to evaluate the degree of struc- tural similarity that suits the purpose of answer seek- ing. At the same time, we also need to consider com- putational overheads because we may need to carry out structural matching hundreds of times to search a sin- gle passage for an answer. Language contains redundancies. The same piece of information can often be linguistically expressed by more than one expression. For example, the in- formation that ‘the name of John F. Kennedy’s fa- ther is Joseph’ can also be realized by, for example, John F. Kennedy, ..., his father, Joseph P. Kennedy’, John F. Kennedy — son of Joseph Patrick Kennedy’, or ‘Joseph named his second son John Fitzgerald Kennedy’. Structural matching may fail to detect the identity between the information conveyed by such paraphrases. The second issue is therefore how to iden- tify diverse paraphrases for answering questions. For the first issue, we extend Collins’s Tree Kernel [1] to formulate a new algorithm for soft structural matching. We present it in Sect. 2. For the second issue, we ex- plore possibilities of incorporating paraphrase generation into question answering. We briefly explain it in Sect. 3. While these two components are both expected to contribute to the approximation of conceptual matching, combining them is also problematic. Addressing this issue, we present an answer seeking algorithm in Sect. 4. Based on the set- ting described in those sections, we then report our empiri- cal evaluation and discuss the issues we encountered in Sec- tions 5 and 6 focusing on effects of structural matching and paraphrasing in question answering. 2. Soft structural matching As the basis of our soft structural matching algorithm, we adopted the Tree Kernel method proposed by Collins and Duffy [1] for the following reasons: It is designed to quantify the degree of similarity be- tween a given pair of trees, which already partly fits our purpose. It detects partial matches of subtrees. It is computationally efficient. To adapt Tree Kernel to question answering, however, fur- ther extensions are necessary.