UT Dialogue System at NTCIR-12 STC Shoetsu Sato Graduate School of Information Science and Technology, The University of Tokyo shoetsu@ tkl.iis.u-tokyo.ac.jp Shonosuke Ishiwatari Graduate School of Information Science and Technology, The University of Tokyo ishiwatari@ tkl.iis.u-tokyo.ac.jp Naoki Yoshinaga * Institute of Industrial Science, The University of Tokyo ynaga@ tkl.iis.u-tokyo.ac.jp Masashi Toyoda Institute of Industrial Science, The University of Tokyo toyoda@ tkl.iis.u-tokyo.ac.jp Masaru Kitsuregawa Institute of Industrial Science, The University of Tokyo National Institute of Informatics kitsure@ tkl.iis.u-tokyo.ac.jp ABSTRACT This paper reports a dialogue system developed at the Uni- versity of Tokyo for participation in NTCIR-12 on the short text conversation (STC) pilot task. We participated in the Japanese STC task on Twitter and built a system that se- lects plausible responses for an input post (tweet) from a given pool of tweets. Our system first selects a (small) set of tweets as response candidates from the pool of tweets by exploiting a kernel-based classifier. The classifier uses bag- of-words in an utterance and a response (candidate) as fea- tures. We then perform re-ranking of the chosen candidates in accordance with the perplexity given by Long Short-Term Memory-based Recurrent Neural Network (lstm-rnn) to re- turn a ranked list of plausible responses. In order to capture the diversity of domains (topics, wordings, writing styles, etc.) in chat dialogue, we train multiple lstm-rnns from subsets of utterance-response pairs that are obtained by clustering of distributed representations of the utterances, and use the lstm-rnn that is trained from the utterance- response cluster whose centroid is the closest to the input tweet. Team Name sss Subtasks Short Text Conversation (Japanese) Keywords conversation, clustering, domain adaptation, word embed- ding, neural network language model 1. INTRODUCTION In the Japanese task of the NTCIR-12 short text con- versation (STC) pilot task, participants need to develop a system that takes an input tweet and extracts, from a pool of tweets (utterance-response pairs), a (short) list of tweets * This work is done while the author concurrently served as a senior researcher at National Institute of Information and Communications Technology (NICT), Japan. that are ranked according to their relative suitability as re- sponse to the input. The size of the pool of tweets is around one million, which consist of 500K utterance-response pairs. To solve this task, Long Short-Term Memory-based Re- current Neural Networks (lstm-rnns) is used to evaluate the suitability of each response in the pool as response to the input tweet. The key features of our system are two-folds: Response pre-filtering Since lstm-rnns are slow to eval- uate the entire responses in the pool, we utilize a clas- sifier to select a tractable number of tweets as response candidates. The classifier based on polynomial kernel is trained with a large number of utterance-response pairs that are independently crawled from Twitter. Domain-aware LSTM-RNNs In chat dialogue on Twit- ter, the diversity of domains (topics, wordings, writing styles etc.) is evident. We therefore train multiple domain-aware lstm-rnns to evaluate the suitability of each response candidate as response. We obtain domain-consistent subsets of utterance-response pairs by clustering and train one lstm-rnn from each sub- set. The lstm-rnn obtained from a subset of utterance- response pairs whose utterances are semantically clos- est to the input tweet is used to evaluate the suitability of each response tweet as response to the input tweet. In what follows, we detail the architecture of our system and briefly summarize the experimental results. 2. SYSTEM ARCHITECTURE Figure 1 depicts our dialogue system used for NTCIR-12 STC pilot task. The numbers in Figure 1 indicate Sections in the following explanations. 2.1 Domain-aware dialogue modeling Topics, wordings and writing styles (or domains) vary sub- stantially in chat dialogue, which makes it difficult to build a universal dialogue model that can handle various domains. Our dialogue system is inspired by Yamamoto and Sumita’s work on domain adaptation for statistical machine transla- tion [6]. They showed that domain-specific models trained on smaller domain-specific corpora performed better than a general model trained on a larger general-domain corpus. Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, June 7-10, 2016 Tokyo Japan 518