Discovering high quality answers in community question answering archives using a hierarchy of classiﬁers Hapnes Toba a,1 , Zhao-Yan Ming b,⇑ , Mirna Adriani a , Tat-Seng Chua b a Information Retrieval Laboratory, Faculty of Computer Science, Universitas Indonesia, Depok 16424, West Java, Indonesia b Lab of Media Search, School of Computing, National University of Singapore, 117417, Singapore article info Article history: Received 21 February 2013 Received in revised form 10 September 2013 Accepted 26 October 2013 Available online xxxx Keywords: Question answering system User generated content Answer quality Question type analysis Hierarchical supervised learning abstract In community-based question answering (CQA) services where answers are generated by human, users may expect better answers than an automatic question answering system. However, in some cases, the user generated answers provided by CQA archives are not always of high quality. Most existing works on answer quality prediction use the same model for all answers, despite the fact that each answer is intrinsically different. However, modeling each individual QA pair differently is not feasible in practice. To balance between efﬁciency and accuracy, we propose a hybrid hierarchy-of-classiﬁers framework to model the QA pairs. First, we analyze the question type to guide the selection of the right answer quality model. Second, we use the information from question analysis to predict the expected answer features and train the type-based quality classiﬁers to hierarchically aggregate an overall answer quality score. We also propose a number of novel features that are effective in distinguishing the quality of answers. We tested the framework on a data- set of about 50 thousand QA pairs from Yahoo! Answer. The results show that our proposed framework is effective in identifying high quality answers. Moreover, further analysis reveals the ability of our framework to classify low quality answers more accurately than a single classiﬁer approach. Ó 2013 Elsevier Inc. All rights reserved. 1. Introduction Since it was introduced in the 1960s [17,28,55,56], the task of question answering system (QAS) has always been at the forefront of technology advances. In its early stage of development, QAS is restricted to structured domains with limited nat- ural language processing [20,35,25]. Along with the advances in the ﬁelds of information retrieval, computational linguistics, and Internet technology, research on QAS were broadened into unstructured textual documents in open domains, and with collaborative users [15,23]. Evaluation forums for QAS, such as TREC [15] and CLEF [41], have steered the development of QAS into an established and large-scale research methodologies and evaluations. Despite the advances that QAS technologies have achieved in TREC and CLEF evaluation forums, existing systems are mostly used on document-based (reputed) re- sources, such as the newspapers and web pages. In a document-based QAS, the quality of answer candidates provided in the retrieval results are typically of good quality. Recent advances in Internet technologies offer more freedom for the users in the ways they express their opinions and interact with one another. Web applications, such as Yahoo! Answers 2 (Y!A), Twitter, Facebook, and Flickr are good examples of how people are connected to each other and share their interests, opinions, knowledge, and social activities. In comparison to 0020-0255/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.ins.2013.10.030 ⇑ Corresponding author. Tel.: +65 94590298. E-mail address: mingzhaoyan@nus.edu.sg (Z.-Y. Ming). 1 This work was done when the ﬁrst author was a research intern at NUS. 2 http://answers.yahoo.com/. Information Sciences xxx (2013) xxx–xxx Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins Please cite this article in press as: H. Toba et al., Discovering high quality answers in community question answering archives using a hier- archy of classiﬁers, Inform. Sci. (2013), http://dx.doi.org/10.1016/j.ins.2013.10.030