Demonstration of interactive dialog teaching for learning a practical end-to-end dialog manager Jason D. Williams Microsoft Research jason.williams@microsoft.com Lars Liden Microsoft laliden@microsoft.edu 1 Introduction This is a demonstration of a platform for build- ing practical, task-oriented, end-to-end dialog sys- tems. Whereas traditional dialog systems consists of a pipeline of components such as intent detec- tion, state tracking, and action selection, an end- to-end dialog system is driven by a machine learn- ing model which takes observable dialog history as input, and directly outputs a distribution over dialog actions. The beneﬁt of this approach is that intermediate quantities such as intent or dia- log state do not need to be labeled – rather, learn- ing can be done directly on example dialogs. In practice, purely end-to-end methods can re- quire large amounts of data to learn seemingly simple behaviors, such as sorting database re- sults. This is problematic because when build- ing a new dialog system, typically no in-domain dialog data exists, so data efﬁciency is crucial. Moreover, machine-learned models alone cannot guarantee practical constraints are followed – for example a bank would require that a user must be logged in before they are allowed to transfer funds. For these reasons, in past work we intro- duced Hybrid Code Networks (HCN) (Williams et al., 2017). HCNs make end-to-end learning practical by combining a recurrent neural network (RNN) with domain-speciﬁc software provided by the developer; domain-speciﬁc action templates; and a conventional entity extraction module for identifying entity mentions in text. Experiments on the public bAbI corpus (Bordes et al., 2017) have shown that HCNs can reduce the number of training dialogs required by an order of magnitude compared to state-of-the-art end-to-end learning methods which do not employ domain knowledge. This demonstration shows a practical imple- mentation of HCNs, as a web service for building task-oriented dialog systems. Once the developer has provided their domain-speciﬁc software, they can add training dialogs in several ways. First, the developer can simply upload dialogs to the train- ing set. Second, the developer can interactively teach the HCN, and making on-the-spot correc- tions. Finally, as the HCN interacts with end- users, the developer can inspect logged dialogs, make corrections if needed, and add the dialogs to the training set. The next section describes the architecture and operation of the platform, and the ﬁnal section de- scribes how the developer uses the service – i.e., what the demonstration shows. 2 Dialog learning platform The practical operation of the HCN is shown in Figure 1, where the left-hand block in white shows a messaging client used by an end user, the center block in blue shows a web service implemented by the system developer that hosts domain-speciﬁc logic, and the right-hand block in green is the HCN web service. A software development kit (SDK) facilitates using the HCN web service. When interacting with end users, the process begins when the end user provides input text, such as “What’s the 5 day forecast for Seattle?”, shown as item 1 in Figure 1. This text is passed to the de- veloper’s web service, which in turn calls the HCN service to perform entity extraction (item 2). The HCN service then returns entity mentions detected in text, such as “location=Seattle” (3). Domain- speciﬁc code on the developer’s service the runs to resolve entity mentions to a canonical form, such as a latitude/longitude pair, and to store entities for use in later turns in the dialog (4). The developer’s code then calls the HCN service again, optionally passing in context which can include which en- tities have been recognized so far in the dialog, as well as an action mask that limits which action