Cross-Domain Analogy in Automated Text Generation Raquel Herv´ as 1 and Francisco C. Pereira 2 and Pablo Gerv´ as 1 and Am´ ılcar Cardoso 2 Abstract. Appropriate use of analogy in computer generated texts would constitute a great advantage in contexts where complex in- formation needs to be communicated. This paper presents research aimed at improving the stylistic quality of the texts generated by a natural language generation system with the use of analogy. A generic architecture and a specific implementation for solving this problem are described. This implementation takes the form of a mul- tiagent architecture in which several independent modules are con- nected, each one dealing with a subtask of the process - enriching domain information, finding structure alignment between domains, inserting analogies in a text generation pipeline. Initial results are presented, and several issues arising from the observed behaviour are discussed, with special attention to possible refinements of the proposal. 1 Introduction Analogy is used frequently by humans when comunicating between them. Provided that our listeners have adequate knowledge of the tar- get domain, using an apt analogy can be a more economical way of communicating information than actually explaining a whole set of facts about a given concept. If computers were to be capable of using analogies, this would constitute a great advantage in contexts such as pedagogical applications - where complex issues may need to be explained to a user -, or simply as an additional tool for communicat- ing complex information in any kind of interactive setting. The task of identifying apt analogies is difficult even for human beings, and it is often considered to have a complex ingredient of creativity. How- ever, the actual process of introducing an analogy into a given text is well within the possibilities of current natural language generation technology. PRINCE (Prototipo Reutilizable Inteligente para Narraci´ on de Cuentos con Emociones) is a natural language generation application designed to build texts for simple fairy tales. The goal of PRINCE is to be able to tell a story received as input in a way that is as close as possible to the expressive way in which human storytellers present stories. To achieve this, PRINCE operates on the conceptual repre- sentation of the story, determining what is to be told, how it is or- ganised, how it is phrased, and which emotions correspond to each sentence in the final output. Emotional content is added in the form of tags, to be realized as synthezised emotional voice [6]. PRINCE has been used as natural language generation front end of ProtoPropp [9], a system for automatic story generation. 1 Departamento de Sistemas Inform´ aticos y Programaci´ on, Universidad Complutense de Madrid, Spain, email: {raquelhb@fdi,pgervas@sip}.ucm.es 2 Centro de Inform´ atica e Sistemas (CISUC), Universidade de Coim- bra, Polo II, Pinhal de Marrocos, 3030 Coimbra, Portugal, email: {camara,amilcar}@dei.uc.pt The research presented in this paper is aimed at improving the stylistic quality of the texts generated by the PRINCE system, by extending its capabilities to include the use of analogy. This is done by exploiting the potential of a lexical resource - such as WordNet - and structure mapping algorithms to enhance the output texts. PRINCE is implemented using the cFROGS architecture [7], a framework-like library of architectural classes intended to facilitate the development of NLG applications. It is designed to provide the necessary infrastructure for the development, minimising the im- plementation effort by means of schemas and generic architectural structures commonly used in this kind of systems. cFROGS identi- fies three basic design decisions when designing the architecture of any NLG system: (1) what set of modules or tasks compound the system, (2) how control should flow between them, deciding the way they are connected and how the data are transferred from one to an- other, and (3) what data structures are used to communicate between the modules. The flow of control information among the modules of PRINCE acts as a simple pipeline, with all the modules in a sequence in such a way that the output of each one is the input for the next. From a given plot plan provided as input to PRINCE, the text generator carries out the tasks of Content Determination, Discourse Planning, Referring Expression Generation, Lexicalization and Surface Realization, each one of them in an independent module. Section 2 describes the lexical resources to be employed in this pa- per and previous work on analogy and structural alignment. Section 3 describes the basic architecture used for modelling the process of analogy generationg. The current multiagent implementation is pre- sented in section 4. The experiments that have been carried out are described in section 5, and discussed in section 6. Section 7 outlines the conclusions. 2 Increasing lexical diversity In PRINCE, the stage of lexical realization is done in a very simple way. Each concept in the tale has an unique associated tag, and for each appearance of a concept the corresponding word is used in the final text. This produces repetitive and poor texts from the point of view of the vocabulary. To overcome this limitation, we have included a number of inde- pendent modules for expanding the number of lexical alternatives, thus allowing the enrichment of the realization process with lexical choice. Several strategies were envisaged and explored to provide those alternatives, from the more simple synonym substitution or the combination of this with the use of other kinds of semantic relations like hyponymy/ hypernymy, up to resorting to rethorical figures. A requirement for these modules was the availability of adequate lexical sources. If synonym substitution could be carried on by means of a simple dictionary, more elaborated substitutions required richer