Open Domain Real-Time Question Answering Based on Semantic and Syntactic Question Similarity Vivek Datla, Sadid A. Hasan, Joey Liu, Yassine Benajiba * Kathy Lee † , Ashequl Qadir, Aaditya Prakash ‡ , Oladimeji Farri Artificial Intelligence Laboratory, Philips Research North America, Cambridge, MA, USA {vivek.datla,sadid.hasan,joey.liu}@philips.com, yassine@lukilabs.com {kathy.lee 1,ashequl.qadir,aaditya.prakash,dimeji.farri}@philips.com Abstract In this paper, we describe our system and results of our participation in the Live-QA track of the Text Retrieval Conference(TREC) 2016. The Live-QA task involves real user questions, extracted from the stream of most recent questions submitted to the Yahoo An- swers (YA) site, which have not yet been answered by humans. These questions are pushed to the participants via a socket con- nection, and the systems are needed to pro- vide an answer which is less than 1000 char- acters length in less than 60 seconds. The an- swers given by the system are evaluated by human experts in terms of accuracy, readabil- ity, and preciseness. Our strategy for answer- ing the questions include question decompo- sition, question relatedness identification, and answer generation. Evaluation results demon- strate that our system performed close to the average scores in question answering task. In the question focus generation task our system ranked fourth. 1 Introduction Question Answering(QA) is a well-studied research area in natural language processing (NLP). Since the early days of artificial intelligence in the 60’s, re- searchers have been fascinated with answering nat- ural language questions (Kwok et al., 2001). Ini- tial efforts for QA systems primarily focused on * This author was affiliated with Philips during this work. † The author is also affiliated with Northwestern Univer- sity(kathy.lee@eecs.northwestern.edu) ‡ The author is also affiliated with Brandeis University (aprakash@brandeis.edu). domain-specific expert systems. The domain spe- cific factoid questions have been answered well and the systems have achieved similar performance as human experts, where as answering open-domain questions in natural language is still an open chal- lenge. The open-domain real life questions amplify the challenge many folds as natural language is am- biguous, and constructing the answer requires an elaborate understanding of the question being asked, expert domain knowledge, as well as language gen- eration models. The open domain real-time question answering task increases the complexity even further as one has to address the issues as mentioned previously and in addition to producing human-like response in less than 60 seconds. The properties of human- like response include structured grammatically cor- rect sentences, which answer the question to the sat- isfaction of a human evaluator. Additionally, the an- swers need to be concise as they are restricted to a 1000 character limit. This is our first participation in the live-QA track and in the following sections we describe our model, results, and experiences. 2 Task Description The LiveQA track was first started in TREC 2015. The competition runs for 24 hrs during which ques- tions being posted on Yahoo Answers site 1 (after some preliminary cleaning) by the real users are posted on to the participating team’s servers regis- tered for the competition. The questions are selected from 7 distinct topics shown in Table 1 As Table 1 indicates, the topics are fairly different and have several sub-categories. The category of the 1 Yahoo Answers - https://answers.yahoo.com/