ORIGINAL PAPER Evaluating question answering validation as a classiﬁcation problem A ´ lvaro Rodrigo • Anselmo Pen ˜as • Felisa Verdejo Published online: 19 March 2011 Ó Springer Science+Business Media B.V. 2011 Abstract Formulating Question Answering Validation as a classiﬁcation problem facilitates the introduction of Machine Learning techniques to improve the overall performance of Question Answering systems. The different proportion of positive and negative examples in the evaluation collections has led to the use of measures based on precision and recall. However, an evaluation based on the analysis of Receiver Operating Characteristic (ROC) space is sometimes preferred in classiﬁ- cation with unbalanced collections. In this article we compare both evaluation approaches according to their rationale, their stability, their discrimination power and their adequacy to the particularities of the Answer Validation task. Keywords Question Answering  Answer Validation  Evaluation 1 Introduction Question Answering (QA) systems receive a question in natural language and return small snippets of text that contain an answer to the question (Voorhees and Tice 1999) Traditional QA systems typically employ a pipeline approach (Moldovan et al. 2000), which produces a dependency among modules that is highly sensitive to error propagation. Introducing more reasoning about the correctness of the returned answers could contribute to overcome the pipeline limitations of QA systems and improve QA A ´ . Rodrigo (&)  A. Pen ˜as  F. Verdejo NLP & IR Group at UNED, Madrid, Spain e-mail: alvarory@lsi.uned.es A. Pen ˜as e-mail: anselmo@lsi.uned.es F. Verdejo e-mail: felisa@lsi.uned.es 123 Lang Resources & Evaluation (2012) 46:493–501 DOI 10.1007/s10579-011-9143-2