Priberam’s question answering system in QA@CLEF 2008 Carlos Amaral, Adán Cassan, Helena Figueira, André Martins, Afonso Mendes, Pedro Mendes, José Pina, Cláudia Pinto Priberam Alameda D. Afonso Henriques, 41 - 2.º Esq. 1000-123 Lisboa, Portugal Tel.: +351 21 781 72 60 Fax: +351 21 781 72 79 {cma, ach, hgf, atm, amm, prm, jfp, cp}@priberam.pt Abstract This paper describes the changes implemented in Priberam’s question answering (QA) system since our last QA@CLEF participation, followed by the discussion of the results obtained in Portuguese and Spanish monolingual runs at the main task of QA@CLEF 2008. This time, the main goal of Priberam’s participation, following the results of last year’s evaluation, was to stabilize the system in order to achieve its potential performance. To attain that performance status, we enhanced the syntactic analysis of the question and improved the indexing process by using question categories at the sentence retrieval level and ontology domains of the expected answer in document retrieval. The fine-tuning of the syntactic analysis, by defining and using core nodes of phrases as objects, allowed the system to more precisely match the pivots of the question with their counterparts in the answer, taking into account their syntactic functions. As a result, in QA@CLEF 2008, Priberam's system achieved a considerable overall accuracy increase in the Portuguese run. ACM Categories and Subject Descriptors H.2 [Database Management]: H.2.3 Languages - Query Languages H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Information Search and Retrieval; H.3.4 Systems and Software; H.3.7 Digital Libraries General Terms Measurement, Performance, Experimentation. Keywords Question answering, Questions beyond factoids, Query Expansion, Portuguese, Spanish. 1 Introduction The performance of Priberam’s system in last year's QA@CLEF displayed internal and external changes. Internally, the system underwent several modifications, both in the Portuguese and in the Spanish modules, the most relevant one being the introduction of syntactic question processing [1]. Externally, the CLEF organisation introduced topic-related questions (questions clustered around a common topic that might present anaphoric links between them) and added Wikipedia as a target document collection to the already existent newspaper corpora [2]. As a result, there was a slight increase of the overall accuracy in the Spanish (ES) run and a significant decrease of the overall accuracy in the Portuguese (PT) run. Nevertheless, Priberam’s system achieved a more accurate question categorisation, hence decreasing the number of wrong candidate answers, due to the introduction of syntactic parsing during question processing. The main goal of Priberam’s participation in QA@CLEF 2008 was to stabilize the system in order to sur- pass the results it obtained in previous QA@CLEF participations [3, 4]. To enhance its performance, we improved the indexing/retrieval process by using question categories (QC) at sentence retrieval level and