Dialogue Control Algorithm for Ambient Intelligence based on Partially Observable Markov Decision Processes Yasuhiro Minami, Akira Mori, Toyomi Meguro, Ryuichiro Higashinaka, Kohji Dohsaka, and Eisaku Maeda NTT Communication Science Laboratories, NTT Corporation 2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0237 Japan {minami,akira,meguro,rh,dohsaka,maeda}@cslab.kecl.ntt.co.jp http://www.kecl.ntt.co.jp/ Abstract. From the viewpoint of supporting users’ natural dialogue communication with conversational agents, their dialogue management has to determine any agent’s action, based on probabilistic methods derived from noisy data through sensors in the real world. We believe unique Partially Observable Markov Decision Processes (POMDPs) should be applied to such action control systems. The agents must ﬂexibly choose their actions to reach a state suitable for the users while re- taining as many statistical characteristics of the data as possible. We oﬀer two technical points to resolve this issue. One is the automatic acquisition of POMDPs ¡˙ state transition probabilities through DBNs with a large amount of dialogue data, and the other is applying re- wards from the emission probabilities of agent actions into POMDPs’ reinforcement learning. This paper proposes a method to simultane- ously achieve purpose-oriented and stochastic naturalness-oriented ac- tion controls. Our experimental results demonstrate the eﬀectiveness of our framework, which shows that the agent can generate both actions without being locked into either of them. Keywords: Partially Observable Markov Decision Process (POMDP), dialogue management, multi-modal interaction, Dynamic Bayesian Net- work (DBN), agent, reinforcement learning (RL), Hidden Markov Model (HMM), Expectation-Maximization (EM) algorithm, 1 Introduction To activate communication between users and agents, the latter have to con- versationally acquire adequate tips while recognizing and understanding the sit- uations available through person-to-person dialogues. The systems must create and establish behavioral strategies based on a large amount of data with their communication. Markov Decision Processes (MPDs) are ordinarily applied to the acquisition of strategies with reinforcement learning (RL) if the state tran- sitions by the agents occur stochastically, depending on their current states and