1 Learning to Reproduce Fluctuating Behavioral Sequences Using a Dynamic Neural Network Model with Time-Varying Variance Estimation Mechanism Shingo Murata 1 , Jun Namikawa 2 , Hiroaki Arie 1 , Jun Tani 3 , and Shigeki Sugano 1 1 Department of Modern Mechanical Engineering, Waseda University, Tokyo, Japan 2 Brain Science Institute, RIKEN, Saitama, Japan 3 Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea Abstract—This study shows that a novel type of recurrent neural network model can learn to reproduce ﬂuctuating training sequences by inferring their stochastic structures. The network learns to predict not only the mean of the next input state, but also its time-varying variance. The network is trained through maximum likelihood estimation by utilizing the gradient descent method, and the likelihood function is expressed as a function of both the predicted mean and variance. In a numerical experiment, in order to evaluate the performance of the model, we ﬁrst tested its ability to reproduce ﬂuctuating training sequences generated by a known dynamical system that were perturbed by Gaussian noise with state-dependent variance. Our analysis showed that the network can reproduce the sequences by predicting the variance correctly. Furthermore, the other experiment showed that a humanoid robot equipped with the network can learn to reproduce ﬂuctuating tutoring sequences by inferring latent stochastic structures hidden in the sequences. I. I NTRODUCTION The ability to learn to predict perceptual outcomes of intended actions has been considered to be essential for the developmental learning of actions in both infants [1] and artiﬁcial agents [2], [3]. Meanwhile, time-developments in our everyday life are not always predictable, but often varied or stochastic. For example, in the process of skill acquisition through imitation learning, because perceptual experiences are noisy and slightly different every time, learners need to extract common information and its ﬂuctuation level from the experiences. Recurrent neural networks (RNNs) have been intensively investigated for their suitability for prediction by learning [4]–[6]. In the context of behavior learning for robots, Tani and colleagues have shown that RNN-based models can learn to predict perceptual consequences of actions in navigation problems [7], as well as to predict perceptual sequences for sets of action intentions in object manipulation tasks [8], [9]. RNN- based models, however, are considerably limited due to the deterministic nature of their prediction mechanism. As deter- ministic dynamical systems, RNNs cannot learn to reproduce stochastic structures hidden in noisy temporal sequence data used for training. If RNNs are forced to learn such temporal sequence data, the learning process tends to become unstable with the accumulation of errors. To address this problem, Namikawa and colleagues recently proposed a novel continuous-time RNN (CTRNN) model that can learn to predict not only the next mean state, but also the variance of the observable variables at each time step [10]. The predicted variance functions as an inverse weighting factor for the prediction error that is back-propagated in the process of learning. The formulation of the model is analogous to the free energy minimization principle proposed by Friston [11], [12], in which learning, generation, and recognition of stochastic sequences are formulated by means of likelihood maximization. This study shows that a novel CTRNN referred to as stochas- tic CTRNN (S-CTRNN) can learn to reproduce ﬂuctuating training sequences generated by a dynamical system by infer- ring their stochastic structures. Furthermore, we describe how the S-CTRNN can be successfully applied in robot learning problems dealing with ﬂuctuating behavioral sequences by conducting an experiment on sensory-guided robot behavior demonstrated to a robot by a human trainer. Different approaches for estimating and utilizing variance for robot behavior learning have been proposed, including combinations of Gaussian mixture model (GMM) and Gaus- sian mixture regression (GMR) [13], [14]. We consider that the implementation of a method for the estimation or prediction of variance in the CTRNN model can be used as an alternative to the previously proposed approaches. The next section presents details about the forward dynam- ics, training, and generation method of the S-CTRNN. II. NEURAL NETWORK MODEL A. Overview The S-CTRNN makes use of a novel feature called “variance prediction units” allocated in the output layer. By utilizing these units, the network predicts not only the mean of the next input, but also its variance. In this method, the mean and the variance can be obtained by means of maximizing the likelihood function for the sequence data. Furthermore, upon achieving convergence of the likelihood, the network can 2013 The Third IEEE International Conference on Development and Learning and on Epigenetic Robotics 978-1-4799-1036-6/13/$31.00 ©2013 IEEE