Learning a musical sequence by observation: A robotics implementation of a dynamic neural field model Flora Ferreira * , Wolfram Erlhagen *§ , Emanuel Sousa ‡ , Lu´ ıs Louro ‡ and Estela Bicho ‡ * Dept. of Mathematics and Applications, Center of Mathematics, University of Minho, Portugal ‡ Dept. of Industrial Electronics, Center Algoritmi, University of Minho, Portugal § Email: wolfram.erlhagen@math.uminho.pt Abstract—We tested in a robotics experiment a dynamic neural field model for learning a precisely timed musical sequence. Based on neuro-plausible processing mechanisms, the model implements the idea that order and relative timing of events are stored in an integrated representation whereas the onset of sequence production is controlled by a separate process. Dynamic neural fields provide a rigorous theoretical framework to analyze and implement the necessary neural computations that bridge gaps between sensation and action in order to mediate working memory, action planing, and decision making. The robot first memorizes a short musical sequence performed by a human teacher by watching color coded keys on a screen, and then tries to execute the piece of music on a keyboard from memory without any external cues. The experimental results show that the robot is able to correct in very few demonstration-execution cycles initial sequencing and timing errors. I. I NTRODUCTION Learning sequential activities such as music, sports or speech requires the ability to represent the order of component actions and the intervals separating them. In many situations, ordinal and timing information must be unified for smooth and skillful performance. Playing a recognizable melody on a piano for instance requires a series of precisely timed fin- ger movements. The neuro-cognitive mechanisms supporting an efficient acquisition of interval and ordinal properties of complex sequences like music are still a matter of debate [1]. It has been suggested that a single learning system might be responsible for integrating sequencing and timing information [2]. Experimental support for this integrated view comes from studies with the classical serial reaction time paradigm (SRT, [3]) in which subjects learn the associations between a series of spatial cues and corresponding response keys. Learning appears to be facilitated when the stimuli are presented in a fixed order compared to a random order. The depended measure of skill acquisition is a gradual reduction in response time that takes place across the sequential trials, indicating that participants develop a temporal expectation of the subsequent stimulus and/or associated motor response without becoming aware of it. Moreover, a variant of the SRT paradigm in which subjects are exposed to sequences with temporal structure, ordinal structure, or both showed that learning for a temporal pattern does not occur independently from the ordinal dimension ([4], [5]). However, since in the SRT protocol responses are made as quick as possible to external cues and no precise timing is needed for accuracy, the question to which extent order and timing information are integrated in the memory of musical sequences remains unsolved. In fact, several observations in the acquisition and performance of music have been used as argument against the fully integrated view. When learning a melody, the pitch sequence is typically acquired first irrespectively of temporal constraints (intervals and rate, [6], see also [7]). Once learned, a piece of music can be easily recognized and performed across a whole range of production rates (for a discussion see [8]). Substantial changes in the temporal structure of a musical sequence may thus occur with no or only little impact on serial order. Here we address the problem of the neural representations supporting the learning and production of a novel melody in an approach that combines theoretical modeling and testing in a real-world robotics experiment. The model based on the theoretical framework of dynamic neural fields implements three key processing principles that are in line with neuro- physiological findings. First, the memory of the sensory cues defining the sequential order is represented by self-sustained activity of neural populations tuned to the continuous stimulus dimension (e.g., pitch or color). The persistent stimulus- dependent activity is however not static but increases mono- tonically as a function of elapsed time since stimulus onset ([9], [10]). As a result, the neural field dynamics establishes an activation gradient over sub-populations that not only encodes the content but also the relative timing of stimulus events. Second, sequence planning starts from a subthreshold activa- tion of all sequence elements in a decision field which mirrors the activation gradient of the sequence memory [11]. Third, sequence recall from memory is associated with a release of pro-active global inhibition in the decision field which leads to a monotonic buildup of activity of all sub-populations [12]. When a certain sub-population reaches a fixed activation threshold, the motor response generating the planned musical event is initiated[13]. To test the various hypothesis of the dynamic field model under real-time constraints of sensing and acting, we conducted an experiment with the humanoid robot ARoS [14]. We used a learning by demonstration paradigm with color coded events in which ARoS learns to perform the