UNED Submission to AVE 2006 Jesús Herrera, Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo Departamento de Lenguajes y Sistemas Informáticos Universidad Nacional de Educación a Distancia Madrid, Spain {jesus.herrera, alvarory, anselmo, felisa}@lsi.uned.es Abstract This paper reports the participation of the Spanish Distance Learning University (UNED) in the First Answer Validation Exercise (AVE) celebrated within the Cross Language Evaluation Forum (CLEF) 2006 edition. The system works for the Spanish language. It is based on a Support Vector Machine (SVM) classification of the pairs < text, hypothesis> given by the organization. This classification is accomplished by means of a set of features obtained from lexical analysis. Yet Another Learning Environment (Yale 3.0) was used for the SVM classification. Freeling was the toolkit elected for lemmatization and named entities recognition. Two runs were submitted and the results obtained, as defined by the organizers, were the following: precision run1 =0.467, recall run1 =0.7168, F run1 =0.5655, precision run2 =0.4652, recall run2 =0.7079, F run2 =0.5615. Categories and Subject Descriptors I.2 [Artificial Intelligence]: I.2.7 Natural Language Processing: Text analysis – Language parsing and understanding Keywords Question Answering, Answer Validation, Textual Entailment, Entity Recognition 1. Introduction The system presented to the First AVE is based on the ones developed for the First [4] and the Second [5] Recognizing Textual Entailment (RTE) Challenges. This is because the parallelism between both exercises. The AVE was defined in the way that a given answer to a question must be validated by means of the information contained in the question, the answer and the text supporting the answer. This information was elaborated by the organization in order to present it in the form of a pair of texts, namely text and hypothesis. The objective of the exercise is to automatically determine if one of the snippets – the text – entails the other one – the hypothesis –; if so, it is said that the answer given to the question is validated considering the information given by the supporting text. For the RTE Challenge, the participant systems must determine the existence of entailment between pairs of texts [1][2]. Thus, the systems participating in the RTE Challenge should be able to participate in the AVE. In the system here described, the basic ideas from the ones presented to the RTE Challenges were kept, but the new system was designed and developed according to the available resources for the Spanish language, lacking some subsystems implemented in the ones cited above such, for example, dependency analysis. In short, the techniques involved in this new system are the following: • Ratio of coincidence between words, unigrams, bigrams and trigrams, respectively, from the texts and their correspondant hypothesises. • Detection of entailment between numeric expressions of the texts and the hypothesises. • Detection of entailment between named entities of the texts and the hypothesises. • Support Vector Machine classification in order to determine the final decision about textual entailment between pairs of text and hypothesis.