Generating Diverse Numbers of Diverse Keyphrases Xingdi Yuan ∗ Microsoft Research Montr´ eal Montr´ eal, Qu´ ebec, Canada eric.yuan@microsoft.com Tong Wang ∗ Microsoft Research Montr´ eal Montr´ eal, Qu´ ebec, Canada tong.wang@microsoft.com Rui Meng ∗ School of Computing and Information University of Pittsburgh Pittsburgh, PA, 15213 rui.meng@pitt.edu Khushboo Thaker School of Computing and Information University of Pittsburgh Pittsburgh, PA, 15213 k.thaker@pitt.edu Daqing He School of Computing and Information University of Pittsburgh Pittsburgh, PA, 15213 dah44@pitt.edu Adam Trischler Microsoft Research Montr´ eal Montr´ eal, Qu´ ebec, Canada adam.trischler@microsoft.com Abstract Existing keyphrase generation studies suffer from the problems of generating duplicate phrases and deﬁcient evaluation based on a ﬁxed number of pre- dicted phrases. We propose a recurrent generative model that generates multiple keyphrases sequentially from a text, with speciﬁc modules that promote genera- tion diversity. We further propose two new metrics that consider a variable number of phrases. With both existing and proposed evaluation setups, our model demon- strates superior performance to baselines on three types of keyphrase generation datasets, including two newly introduced in this work: STACKEXCHANGE and TEXTWORLD ACG. In contrast to previous keyphrase generation approaches, our model generates sets of diverse keyphrases of a variable number. 1 Introduction Keyphrases are short pieces of text that humans use to summarize the high-level meaning of a longer text, or to highlight certain important topics or information. Keyphrase generation is the task of automatically predicting keyphrases given a source text. Models that perform this task should be capable not only of distilling high-level information from a document, but also of locating speciﬁc, important snippets within it. Complicating the problem, keyphrases may or may not appear directly and verbatim in their source text (they may be present or absent). A given source text is usually associated with a set of keyphrases. Thus, keyphrase generation is an instance of set generation, where each element in the set is a short sequence of tokens and the size of the set varies depending on the source. Most prior studies approach keyphrase generation similarly to summarization, relying on sequence-to-sequence (Seq2Seq) methods (Meng et al. (2017); Chen et al. (2018a); Ye and Wang (2018); Chen et al. (2018b)). Conditioned on a source text, Seq2Seq models generate phrases individually or as a longer, concatenated sequence with delimiting tokens throughout. Standard Seq2Seq models generate only one sequence at a time. To overcome this * These authors contributed equally. The order is determined by a ﬁdget spinner. 1 arXiv:1810.05241v1 [cs.CL] 11 Oct 2018