Corpus-driven conversational agents: tools and resources for multimodal dialogue systems development Maria Di Maro Department of Humanities University of Naples ‘Federico II’, Italy maria.dimaro2@unina.it Abstract In this paper, we describe how tools made available through CLARIN can be applied for research purposes in the development of corpus-driven conversational agents. The starting point will be the description of a standard architecture for multimodal dialogue systems. For some of its parts, speciﬁc available tools will be brieﬂy described, according to their suitability to mutimodal dia- logue systems development. 1 Introduction The present paper gives an overview on tools and resources available within the CLARIN infrastructure, which can be exploited in the development of conversational agents, especially as far as language and dialogue modelling are concerned. Spoken dialogue systems are nowadays in the spotlight in different commercial, academic and industrial sectors: it will sufﬁce to consider the success and popularity of tools like Amazon Alexa and Google Home [L´ opez et al., 2017], or of the widespread in-car dialogue systems [Becker et al., 2006,Kousidis et al., 2014]. Conversational Agents are computer systems capable of conversing with humans. These dialogue systems are one of the most currently researched ﬁeld in Artiﬁcial Intelligence, since the ability to communicate ones understanding by means of language is one possible way to manifest intelligence. In the Macmillan Dictionary 1 , intelligence is deﬁned as the ability to understand and think about things, and to gain and use knowledge. In this deﬁnition, one concept draws particular attention: ‘knowledge’. Building the knowledge base for such systems is the ﬁrst step to give them intelligence. For this particular goal, the use of some tools facilitates the job of interaction designers, such as linguists. At the two extremes of the learning continuum, we ﬁnd on the one hand deterministic rules given to the system to interpret some particular signals and react to them appropriately [McGlashan et al., 1992], whereas on the other hand we have end-to-end dialogue systems which do not make any distinction in the abilities the system should perform at different levels, but it is provided with data from which tendencies are statistically extracted [Ritter et al., 2010, Vinyals and Le, 2015, Serban et al., 2016, Bordes et al., 2016]. In the middle, we have the possibility to train different modules with the application of different strategies and tools. Overall, the corpus-driven approach is becoming more and more important to infer knowledge and communicative strategies in the ﬁeld of spoken language understanding and generation for applying different statistic and machine learning algorithms [Serban et al., 2018]. This means that appropriate collection of data, in combination with speciﬁc tools, are required to model one’s own system. In this work, we will concentrate on multimodal dialogue systems, which not only make use of spoken language, but which also use other communication channels to understand and express intents [Lucig- nano et al., 2013]. For this reason, the knowledge to be constructed will comprise different linguistic and paralinguistic levels. The standard architecture for a multimodal dialogue system consists of different modules, which serves one another to build the interaction (Figure 1). The input elaborated by the user is ﬁrst processed by a module, which takes the audio produced by the user and transform it in a string to This work is licenced under a Creative Commons Attribution 4.0 International Licence. Licence details: http:// creativecommons.org/licenses/by/4.0/ 1 Macmillan Dictionary Online: https://www.macmillandictionary.com/ [last consultation on the 24th January 2019] Maria Di Maro 2019. Corpus-driven conversational agents: tools and resources for multimodal dialogue sys- tems development. Selected papers from the CLARIN Annual Conference 2018. Linköping Electronic Confer- ence Proceedings 159: 39–45. 39