Why-type Qestion classification in Qestion Answering System Manvi Breja National Institute of Technology, Kurukshetra Kurukshetra, Haryana manvi.breja@gmail.com Sanjay Kumar Jain National Institute of Technology, Kurukshetra Kurukshetra, Haryana skj_nith@yahoo.com ABSTRACT The fundamental requisite to acquire information on any topic has become increasingly important. The need for Question Answering Systems (QAS) prevalent nowadays, replacing the traditional search engines stems from the user requirement for the most accurate an- swer to any question or query. Thus, interpreting the information need of the users is quite crucial for designing and developing a question answering system. Question classifcation is an important component in question answering systems that helps to determine the type of question and its corresponding type of answer. In this paper, we present a new way of classifying Why-type questions, aimed at understanding a questioner’s intent. Our taxonomy classi- fes Why-type questions into four separate categories. In addition, to automatically detect the categories of these questions by a parser, we diferentiate them at lexical level. CCS CONCEPTS • Information systems → Question answering; KEYWORDS Question answering system, why-questions, question classifcation, answer types 1 INTRODUCTION The rapid advancement of Web has allowed the researchers to store information on a wide variety of topics. Search engines [5] return a relevant list of web pages, according to the user’s need. But fnd- ing the most appropriate and precise answer for a given question, has motivated the development of Question Answering Systems. These days, QA becomes a researched topic in the feld of NLP and IR. Question answering System [8] is an information retrieval system that automatically generates an accurate answer of a nat- ural language question. Questions elicit information in the form of answers. The answer to the questions depends on the types of questions. In English language, there are several types of questions starting with word what, when, who, where, why, how, etc. Ques- tions beginning with what, when, who and where are factoid type questions [13] and can be answered in a single phrase or sentence. Whereas, questions starting with why and how belong to non- factoid questions. Such type of questions are complex and involve variations in their answers. Why-type questions require reasoning and explanations in their answers and how-type questions involve procedures/manners which vary among individuals. Their answers range from a sentence to a paragraph or even a whole document. Though past studies addressed the issue of question classifcation for various questions starting with what, when, where, etc., few of them have addressed the classifcation of Why-type questions. As an attempt to understand the questioner’s intent in the why- question asked on QASs, we propose a classifcation of why-type questions which plays an important role in the development of QASs. We begin the analysis of 1000 why-questions, randomly sam- pled from the QA sites and from the datasets available on the Web. With the analysis, we propose a classifcation with four categories (1) Informational Why-questions, (2) Historical Why- questions, (3) Contextual/Situational Why-questions, and (4) Opinionated Why- questions. To enable the automatic detection of these four types of questions by a parser [2], we discussed the features that diferentiate them and helps them to be recognized. Our proposed taxonomy can serve as a crucial step in the devel- opment of Why-type QAS: frst, by automatically diferentiating questions, it can help us decide the knowledge source to be referred to fnd an answer, secondly it can help determine the expected answer type of a question. The rest of this paper is organized as follows: In section 2, we give a brief overview on QA systems. In section 3, we discuss the motivation for carrying out research in why-QA. Section 4 discusses the related work on question classifcation. Section 5 describes the research issues faced in why-QA. Section 6 introduces the research objectives. Section 7 describes the methodology used in research. Section 8 describes the procedure of data collection to carry out research, Section 9 discusses the proposed classifcation of why- questions and their distinguished features analysis. Finally, Section 10 concludes our work with future plans. 2 QUESTION ANSWERING SYSTEM Question answering systems answer the questions asked in natural language. They use information retrieval and natural language pro- cessing techniques to fnd an appropriate answer. The architecture of QAS includes four modules namely, question processing, docu- ment retrieval, answer extractor, and answer re-ranker as illustrated in Figure 1. Question processing module performs activities (1) question classifcation, and (2) question reformulation. The question classif- cation is an important module of QAS as it afects the subsequent answer extraction module, and hence determines the accuracy and performance of QAS. Question classifcation accurately assign a label to a question and categorize it into one of the predefned classes. This further helps in predicting the answer type for the given question [33]. The question reformulation module reformu- lates a question (Q) into a new question (Q’) by adding appropriate terms, deleting punctuation marks, and thus, highlighting the in- formation needs of a user. After question processing, document retrieval module of a QAS returns a ranked list of relevant doc- uments in response to a reformulated question. A document is considered to be relevant if its contents are relevant to the answer