The Use of Lexical Context in Question Answering for Spanish M. Pérez-Coutiño, T. Solorio, M. Montes-y-Gómez † , A. López-López and L. Villaseñor-Pineda Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE) Luis Enrique Erro No. 1, Sta Ma Tonantzintla, 72840, Puebla, Pue, México. {mapco,thamy,mmontesg,allopez,villasen}@inaoep.mx Abstract. This paper describes the prototype developed by the Language Technologies Laboratory at INAOE for Spanish monolingual QA evaluation task at CLEF 2004. Our approach is centered in the use of context at a lexical level in order to identify possible answers to factoid questions. Such method is supported by an alternative one based on pattern recognition in order to identify candidate answers to definition questions. The methods applied at different stages of the system and prototype architecture for question answering are described. The paper shows and discusses the results achieved with this approach. Keywords: Question Answering for Spanish, Lexical Context, Natural Language Processing. 1 Introduction Question Answering (QA) systems has become an alternative to traditional information retrieval systems because of its capability to provide concise answers to questions stated by the user in natural language. This fact, along with the inclusion of QA evaluation as part of the Text Retrieval Conference (TREC) 1 in 1999, and recently [6] in Multilingual Question Answering as part of the Cross Language Evaluation Forum (CLEF) 2 , have arisen a promising and increasing research field. The Multilingual Question Answering evaluation track at CLEF 2004 is similar to last year edition. For each subtask, participants are provided with 200 questions requiring short answers. Some questions may not have known answer, and systems should be able to recognize them. However there are some important differences, this year answers included fact based instances or definitions, and systems must return exactly one response per question, and up to two runs. Our laboratory has developed a prototype system for Spanish monolingual QA task. Two important things to consider about it are, a) this is our first QA prototype and has been developed from scratch, and b) this the first time that our laboratory participates in an evaluation forum. The prototype described in this document relies in the fact that several approaches of QA systems like [8, 13, 4, 10] use named entities at different stages of the system in order to find a candidate answer. Generally speaking, the use of named entities is performed at the final stages of the system, i.e., either in the passage selection or as a discriminator in order to select a candidate answer at the final stage. Another interesting approach is the use of Predictive Annotation which was first presented at TREC-8 by Prager et al. [8]. One meaningful characteristic of this approach is the indexing of anticipated semantic types, identifying the semantic type of the answer sought by the question, and extracting the best matching entity in candidate answer passages. In their approach, the authors used no more than simple pattern matching to get the entities. Our prototype was developed to process both, questions and source documents in Spanish. Our system is based on approach just described but differs in the following: i) Semantic classes’ identification relies in the preprocessing of the whole document collection by a POS tagger that simultaneously works as named entity recognizer and classifier. ii) The indexing stage takes as item the lexical context associated to each single named entity contained in every document of the collection. iii) The searching stage selects as candidate answers those named entities whose lexical contexts match better the context of the question. iv) At the final stage, candidate answers are compared against a second set of candidates gathered from the Internet. v) Final answers are selected based on a set of relevance measures which encompass all the information collected in the searching process. The rest of this paper is organized as follows; section two describes the architecture and functionality of the system; section three details the process of question processing; section four details the process of indexing; section five shows the process of searching; section six describe the process of answer selection; section seven discusses the results achieved by the system; and finally section eight exposes our conclusions and discusses further work. † This work was done while visiting the Dept. of Information Systems and Computation Polytechnic University of Valencia, Spain. 1 http://trec.nist.gov/ 2 http://clef-qa.itc.it/