Improving Text Generation via Neural Discourse Planning Alexander Chernyavskiy alschernyavskiy@gmail.com National Research University Higher School of Economics Moscow, Russia ABSTRACT Recent Transformer-based approaches to NLG like GPT-2 can gener- ate syntactically coherent original texts. However, these generated texts have serious faws. One of them is a global discourse incoher- ence. We present an approach to estimate the quality of discourse structure. Empirical results confrm that the discourse structure of currently generated texts is inaccurate. We propose the research di- rections to plan it and fll in the text in its leaves using the pipeline consisting of two GPT-based generation models. The suggested approach is universal and can be applied to diferent languages. ACM Reference Format: Alexander Chernyavskiy. 2022. Improving Text Generation via Neural Dis- course Planning. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (WSDM ’22), February 21ś25, 2022, Tempe, AZ, USA. ACM, New York, NY, USA, 2 pages. https://doi.org/10.1145/3488560. 3502214 1 INTRODUCTION Natural Language Generation (NLG) task is one of the most chal- lenging and important tasks in NLP. There are various types of NLG tasks: text summarization, machine translation, knowledge aggregation, etc. We consider tasks where the main goal is to con- struct a text that cannot be distinguished from a human-written text, by a human or a recognition system. The most successful and universal models for solving NLP tasks are models based on the idea of transformers. Hence GPT [11] and its larger modifcations, e.g. GPT-2 [12], successfully perform text generation tasks. However, they still have drawbacks. First of all, fragments in some generated texts do not cohere well with each other, despite the correct syntactic structure. Ko and Li [7] demonstrated that even the words that indicate discourse relations (such as łbutž and łbecausež) can be generated improperly, and proposed an auxiliary model to correct them. More problems arise at a higher level, associated with the consistency between sentences. In some cases, the model generates a completely incorrect discourse structure triggered by an inability to plan it. Thus, even the order of the discourse relations should be corrected. We conducted experiments for GPT-2 and distinguished two types of its mistakes. Firstly, it does not generate well an overall discourse structure (RST-based, [8]). Accordingly, contradictions can be found in it. We fne-tuned GPT-2 on lower-cased movie reviews. Here are examples of mistakes in the generated texts. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). WSDM ’22, February 21ś25, 2022, Tempe, AZ, USA © 2022 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-9132-0/22/02. https://doi.org/10.1145/3488560.3502214 Figure 1: A part of the discourse tree for the generated text: “... [named john] 6 [who survives a major accident] 7 [and is saved by a state of the art experimental operation] 8 [that turns him into a robotic machine-like agent] 9 [who has tools and con- traptions of all sorts] 10 [built into his body at his use] 11 [when he says] 12 ...”. Arrows are drown from Nucleus to Satellites. Let us consider the example demonstrated in Figure 1. The sen- tence has too many łElaborationž and łJointž rhetorical relations, which are default ones. Moreover, thought structure is not refected in this discourse tree as it looks like a chain. Generally, genuine discourse trees are more balanced. Apart from that, the łfnalž summary is in the middle of the text in some cases. It is not followed by the łend of sequencež token and continues by Elaboration. As a result, the text is duplicated and contradictions may arise. In this paper, we present an automatic approach to estimate the quality of discourse structure and experimentally confrm that the discourse structure can be generated improperly in some cases. Our main goal is to develop the model that can generate EDUs connected by discourse relations in the correct order and use the correct words to express it. To this end, we propose a pipeline consisting of two GPT-2-based generation models. 2 RELATED WORK Puduppully et al. [10] considered the task of generating summaries for games. The output texts are long but obey a certain structure. The authors’ model learns content plans from training data. Most data-to-text datasets do not naturally contain content plans. These plans can be derived following an information extraction approach, by mapping the text in the summaries onto entities in the structured data, their values, relations and types. Similarly, Ciampaglia et al. [4] showed that we can leverage any collection of factual human knowledge for automatic fact checking. At the same time, some neural methods can be used to plan content and structure without any knowledge bases. For instance, Peng et al. [9] proposed a method to generate text endings based on a pre-planned intent which is predicted due to an additional neural model. Also, some researchers suggested planning the entire discourse structure or its approximation. Biran and McKeown [1] proposed neural text generation based on the selected discourse relations which can be chosen using n-grams. Ji et al. [6] suggested a similar Doctoral Consortium WSDM ’22, Feb. 21–25, 2022, Virtual Event, Tempe, AZ, USA 1543