Global-Selector: A New Benchmark Dataset and Model Architecture for Multi-turn Response Selection Chiyu Song * Westlake University songchiyu@westlake.edu.cn Hongliang He * Westlake University hehongliang@westlake.edu.cn Huachuan Qiu Westlake University qiuhuachuan@westlake.edu.cn Haofei Yu Zhejiang University haofeiyu@zju.edu.cn Zhenzhong Lan † Westlake University lanzhenzhong@westlake.edu.cn Abstract As an essential component of dialogue systems, multi-turn response selection aims to pick out the optimal response among a set of candidates to improve the dialogue ﬂuency. In this paper, we investigate three problems of current response selection approaches, especially for generation-based conversational agents: (i) Existing approaches are often formulated as a sentence scoring problem, which does not consider relationships between responses. (ii) Existing models tend to select undesirable candidates that have large overlaps with the dialogue history. (iii) Negative instances in training are mainly constructed by random sampling from the corpus, whereas generated candidates in practice typically have a closer distribution. To address the above problems, we create a new dataset called ConvAI2+ and propose a new response selector called Global-Selector. Experimental results show that Global-Selector trained on ConvAI2+ have noticeable improvements in both accuracy and inference speed. 1 Introduction Nowadays, dialogue response generation has gained increasing attention in the NLP community. An approach we call "generate-then-select" has been proven effective in giving natural and ﬂuent responses and is widely used in many advanced chatbots, such as Meena (Adiwardana et al., 2020), DialoGPT (Zhang et al., 2020), PLATO (Bao et al., 2020), and Blenderbot (Roller et al., 2021). This approach works as follows: (i) Generating and decoding multiple candidate responses from a generator. (ii) Sending them to a selector to ﬁnd the best one. (iii) Returning the optimal response. Thus, for generation-based dialogue systems, a good selector plays a vital role in boosting conversation quality (Adiwardana et al., 2020). As we all know, the arrival of self-attention mechanism (Vaswani et al., 2017) and pre-trained models (BERT (Devlin et al., 2019), RoBERTa (Liu et al., 2019), ALBERT (Lan et al., 2020), etc.) has led to remarkable progress in various natural language understanding tasks. In this paper, we focus on employing pre-trained models for multi-turn response selection and ﬁnd three defects to be improved: * Equal contribution. † Corresponding author. Preprint. Under review. arXiv:2106.01263v1 [cs.CL] 2 Jun 2021