IASL RITE System at NTCIR-9 Cheng-Wei Shih, Cheng-Wei Lee, Ting-Hao Yang, Wen-Lian Hsu Institute of Information Science, Academia Sinica, Taiwan, R.O.C {dapi, aska, tinghaoyang, hsu}@iis.sinica.edu.tw Abstract We developed a knowledge-based textual inference recognition system for both BC and MC subtasks at NTCIR-9 RITE. Five different modules, which use named entities, subject-modifier word pairs, negative expressions, exclusive tokens and sentence length respectively, were implemented to determine the entailment relation of each sentence pair. Three decision making approaches were applied to integrate all the results from the recognition modules into one entailment result. The evaluation result showed that our system achieved 0.661 and 0.501 for traditional Chinese BC and MC subtasks respectively. For the simplified Chinese, the accuracy reached 0.715 and 0.565 for BC and MC respectively. 1. Introduction Text understanding and inference, which is already believed as a necessary step in natural language application such as question answering, text summarization, and information retrieval, is one of the most challenging tasks in natural language processing. Therefore, determining the inference relation between two texts has become an important research topic since the First Recognizing Textual Entailment Challenge (RTE-1) hold in 2005 [1]. This year, NTCIR-9[2] provided a standard evaluation platform for Asia languages; aimed to help researchers focus on the text inference problem. In RITE, All the systems were asked to classify the relations of sentence pairs (t1,t2) into both binary classes (Yes/No) and multiple classes (Forward, Reserve, Bidirectional, Contradiction, and Independent). Participants can freely use any language tools and knowledge resources to achieve the goal. We, team IASLD, aimed to recognize Chinese textual entailment relation in this task. The description of our work is organized as follows. Section 2 describes the system architecture. In Section 3, 4, and 5, we introduce the preprocess steps, the relation determining modules, and the decision making processing of entailment relation. Finally, we present the system performance in Section 6 and conclude our work in Section 7. 2. System Architecture Our system focuses on knowledge-based approaches to classify five kinds of relations between two sentences. Several NLP tools and semantic resources are integrated into five different modules for relation recognition. We only aim at multiple-class classification. Our MC results are derived from the MC results. Figure 1 shows the structure of our system. 3. Preprocessing In order to improve the accuracy of the output, some preprocessing steps are performed after the system receives each pairs. These steps include numerical character transformation and literal difference classification. 3.1. Numerical Character Transformation All the numerical characters in numerical and temporal expressions are replaced by normalized digit forms. For example, a sentence such as “ԫ԰԰ ԰ڣԼԲԼ” (December 10, 1999) is converted to “ ˅˄ ڣ ˌˌˌ˄ ˄˃ ” by substituting Chinese characters for digits. As not all sentences with Chinese numerical characters should be transformed, we need to be able to distinguish normal Chinese terms with numerical characters and numerical/temporal expressions. In our system, we use some hand-made regulations to target numerical and temporal expressions before the transformation. On the other hand, in some numerical and temporal expressions such as range and duration, redundant parts are usually omitted. For example, a duration expression like “ ԫ԰԰԰ ڣԼԼԲ ” ― 379 ― Proceedings of NTCIR-9 Workshop Meeting, December 6-9, 2011, Tokyo, Japan