A case based reasoning model for multilingual language generation in dialogues Víctor López Salazar , Eduardo M. Eisman Cabeza, Juan Luis Castro Peña, Jose Manuel Zurita López Dept. Computer Science and Artificial Intelligence, ETSIIT, University of Granada, C/Periodista Daniel Saucedo Aranda, s/n, Granada, Spain article info Keywords: Dialogue Speech acts Conversational Agents Natural Language Generation abstract The process of Natural Language Generation for a Conversational Agent translates some semantic lan- guage to its surface form expressed in natural language. In this paper, we are going to show a Case Based Reasoning technique which is easily extensible and adaptable to multiple domains and languages, that generates coherent phrases and produces a natural outcome in the context of a Conversational Agent that maintains a dialogue with the user. Ó 2012 Elsevier Ltd. All rights reserved. 1. Introduction The industry has got a growing interest in natural language interfaces which make possible that users easily interact in a nat- ural way with the devices they use. These interfaces are usually Embodied Conversational Agents (ECAs) or, in general, Conversa- tional Systems. In the educational field, there are many opportuni- ties where these systems could be employed, e.g. tutoring systems which give curricular advice or lessons about a particular matter (Graesser, Chipman, Haynes, & Olney, 2005), conversational games testing emotional abilities (Rehm & Wissner, 2005), embodied agents to simulate different roles in a professional environment (Kopp, Gesellensetter, Kramer, & Wachsmuth, 2005). These agents carry out, in a broad view, three big tasks: Natural Language Understanding, Dialogue Management, and Natural Language Gen- eration (NLG). For the last one, although there is an agreement be- tween the global subtasks that a NLG process should carry out (Reiter & R, 1997), there is not a standard technique to do it be- cause it depends in many ways on the selected problem domain. Basically, there are three approaches to tackle the NLG problem; which are in ascending order of complexity and generality: canned text, templates, and symbolic approaches employing knowledge representations at different linguistic levels and rules to manipu- late them. Canned text has the advantage of being a simple ap- proach; it only needs the final text to be generated, but has the drawback that it is not reusable. Templates have got a more ab- stract view generating Natural Language (NL), mixing fixed text with variable text. An example of a classical system using this ap- proach is ELIZA (Weizenbaum, 1966), which inserts part of the user input in the system answers to simulate the process of a psycho- therapist doing a therapy. This is a more general NLG technique, because it does not need to pre-generate all the system answers, although these templates could not be reused in other situations that those for which they have been initially created. The last ap- proach usually employs linguistic knowledge as grammars or rhe- torical operators (Mann & Thompson, 2005) to describe the part of the language used by the system making it more generic, although this leads to raising the complexity of the system. Conversational agents usually focus their NLG methodology on the use of templates. A well known language to develop conversa- tional agents is AIML (Wallace, 2000). It is based on stimulus– response scheme for answer generation, using pattern matching for recognizing the user input and templates for natural language generation. The semantics of the contents of the agent’s answer could be specified as two simple string tags, by the ‘‘topic’’ and ‘‘that’’ tags. This scheme is clearly insufficient to establish the contents of the answer, because the agent cannot say what he wants but only the matching answer to a user input. One proposal developing this scheme is given by Kimura and Kitamura (2006) which extends the AIML language allowing to incorporate SPARQL queries to extract sentences from web pages annotated with RDF, making the agent more dynamic. Lim and Cho (2005) use a genetic programming algorithm to make the an- swers of a conversational agent more varied using Sentence Plan Trees (SPT) elements which contain the structure of the answer. SPT are binary trees containing templates in their leaves and joint operators joining these sentences in their parent nodes. The algo- rithm works crossing and mutating these operators, creating new sentences. ProtoPropp (Gervas, Diaz-Agudo, Peinado, & Hervas, 2005) is a story plot generation program which uses a CBR technique to build a story from an initial description of its plot, using Propp functions to organize the tale and an ontology to capture the domain entities and to set all the relevant entities for the generation task. Cases are complete plot tales composed by related movements. A movement is a kind of procedure related to several Propp functions that allows 0957-4174/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2012.01.085 Corresponding author. E-mail addresses: victor@decsai.ugr.es (V. López Salazar), eisman@decsai.ugr.es (E.M. Eisman Cabeza), castro@decsai.ugr.es (J.L. Castro Peña), zurita@decsai.ugr.es (J.M. Zurita López). Expert Systems with Applications 39 (2012) 7330–7337 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa