NLEL-MAAT at ResPubliQA Santiago Correa and Davide Buscaldi and Paolo Rosso NLE Lab, ELiRF Research Group, DSIC, Universidad Polit´ecnica de Valencia, Spain. {scorrea, dbuscaldi, prosso}@dsic.upv.es http://users.dsic.upv.es/grupos/nle Abstract. This report presents the work carried out at NLE Lab for the QA@CLEF-2009 competition. We used the JIRS passage retrieval system, which is based on redundancy, with the assumption that it is possible to find the response to a question in a large enough document collection. The retrieved passages are ranked depending on the num- ber, length and position of the question n -gram structures found in the passages. The best results were obtained in monolingual English, while the worst results were obtained for French. We suppose the difference is due to the question style that varies considerably from one language to another. 1 Introduction An open-domain Question Answering (QA) system can be viewed as a specific Information Retrieval (IR) system, in which the amount of information retrieved is the minimum amount of information required to satisfy a user information need expressed as a specific question, e.g.: “Where is the Europol Drugs Unit?”. Many QA systems are based on Passage Retrieval (PR) [6, 4]. A PR system is an IR system that returns parts of documents (passages) instead of complete documents. Their utility in the QA task is based on the fact that in many cases the information needed to answer a question is usually contained in a small portion of the text [3]. In the 2009 edition of CLEF, the competition ResPubliQA 1 has been orga- nized, a narrow domain QA task, centered on the legal domain, given that the data is constituted by the body of European Union (EU ) law. Our participation in this competition has been based on the JIRS 2 open source PR system, which has proved to be able to obtain better results than classical IR search engines in the previous open-domain CLEF QA tasks [1]. In this way we desired to eval- uate the effectiveness of this PR system in this specific domain and to check our hypothesis that most answers usually are formulated similarly to questions, in the sense that they contain mostly the same sequences of words. In the next section, we describe the characteristics of the task; furthermore, Sect. 3 and 4 1 For more information about the competition ResPubliQA@CLEF-2009, refer to page: http://celct.isti.cnr.it/ResPubliQA/ 2 http://sourceforge.net/projects/jirs/