$6SHHFK$UFKLWHFWXUHIRU3HUVRQDO$VVLVWDQWVLQD .QRZOHGJH0DQDJHPHQW&RQWH[W (PHUVRQ&DEUHUD3DUDLVR and-HDQ3DXO$%DUWKqV and&HVDU$7DFOD $EVWUDFW This paper describes the design of a speech and natural language dialog interface for Personal Assistants. We present such an architecture in a multi-agent system and apply it to knowledge management. As a clear result of this conversational speech interface, we expect an improvement in the quality of assistance.  ,1752’8&7,21  Conversational interfaces as defined by Kölzer [1] let users state what they want in their own terms, just as they would do, speaking to another person. In particular, interfacing humans to computer systems using Personal Assistants (PA) agents is a good candidate for a conversational approach. Indeed, PAs are agents that help human users (often referenced as PDVWHUV) to do their daily work. We are convinced that a speech and conversational interface would improve the quality of assistance form a PA. We developed a spoken dialog system infrastructure for building interfaces to be used by PAs. We are applying our approach to a knowledge management (KM) multi-agent system (MAS) used in the context of research and development projects, as explained by Tacla and Barthès in [2]. The MAS has been developed to support cooperative projects, where each participant shares documents, exchanges information, and contributes to building a distributed organizational memory. To this purpose, each user is given a PA and can use plain English to control it or to ask it to perform tasks, like retrieving a document from a Lotus Notes® database or looking for knowledge in the organizational memory. The user and her PA use practical dialogs—which means that they are pursuing specific goals or tasks cooperatively as defined by Allen et al [3]. The dialog system is task-oriented. Tasks range from simple tasks like “locate a document” to more complex tasks that must be decomposed into subtasks. The nature of the application allows us to restrict the space of dialogs to those containing only Directives Speech Acts statements (e.g., inform, request, or answer). We describe now our intelligent speech architecture and how it works. We also describe briefly how the system is being used for KM and report preliminary results on the increase quality of the assistance.  $5&+,7(&785( )25 $1 ,17(//,*(17 63((&+,17(5)$&( The global architecture is shown in Figure 1. It has three parts: (i) 1 Laboratoire Heudiasyc, Université de Technologie de Compiègne, BP 20529, 60.205, Compiègne, France, email: {eparaiso,barthes}@utc.fr 2 Centro Federal de Educação Tecnologica do Parana, CEP: 80.230-901, Curitiba, PR, Brazil, email: tacla@dainf.cefetpr.br graphical and speech user interface (GSUI) modules; (ii) linguistic modules; and (iii) agency modules. GSUI modules produce outputs or collect the user’s inputs, like capturing voice and handling GUI events. Linguistic modules are responsible for lexical and syntactical analysis and context verification. Agency modules are directly connected to the agent kernel, that can “intelligently” manage the dialog and the interface with the help of an ontology. Interface diagram The utterances are captured using a commercial automatic speech recognition engine that returns the recognized result for each word. The Utterance Capturing module concatenates all the words forming an utterance. A process running independently analyzes each utterance. Due to local noise interference or bad pronunciation, the utterance may be lexically and/or syntactically different from the words actually said. Initially, we are using the utterance as it is, extracting a list of known disfluencies. The process of interpreting an utterance is done in two steps: (i) parsing and syntactic analysis; and (ii) ontology application. The results are sent to the dialog manager continuously, or back to the user if it is a nonsensical utterance. Spoken sentences have many more pronouns than written sentences. They are shorter, consisting of fragments or phrases [4]. We designed grammar rules to handle such specificities. Although our interface uses a list of specialized grammars, the latter are not restrictive. We limited the space of dialog utterances to Directives Speech Act classes—inform, request, or answer—since they define the type of expected utterances in a master-slave relationship. The grammar rules were divided in order to classify an utterance into one of the three categories. After classification, it is possible to start the domain treatment, with the help of a domain ontology and of WordNet. Domain knowledge is used here to further process the user’s statements and for reasoning. According to a taxonomy proposed by Guarino [5], they are domain and task ontologies. We are using a set of task and domain ontologies, distinguishing domain and task models for reasoning. As suggested by Allen, this