Emotional interactions as a way to structure learning P. Gaussier 1,2 , S. Boucenna 2 , J. Nadel 3 1 Member of the Institut Universitaire de France, 2 Neuro-cybernetic team, Image and Signal processing Lab., Cergy Pontoise University / CNRS UMR 8051 / ENSEA 3 LVAP Hopital de la Piti´ e Salp´ etri` ere Paris, France Since several years, we are interested in under- standing how babies learn to recognize facial ex- pressions without having a teaching signal allow- ing to associate for instance an “happy face” with their own internal emotional state of happiness (Gergely and Watson, 1999). Using the cognitive system algebra (Gaussier, 2001), we showed a simple sensori-motor architecture using a classical condition- ing paradigm could solve the task if and only if we suppose that the baby produces first facial expres- sions according to his/her internal emotional state and that next the parents imitate the facial expres- sion of their baby allowing in return the baby to as- sociate these expressions with his/her internal state (Gaussier et al., 2004). If the adult facial expression are not synchronized with the baby facial expression, the task cannot be learned. Psychologigal experi- ments (Nadel et al., 2006) have shown that humans ”reproduce” involuntary the facial expression of our robot face. This low level resonance to the facial ex- pression of the other can be a bootstrap for the baby learning. In this work, the first goal was to show our the- oretical model can control a real robot. Next, we wanted to verify in dynamical conditions if humans naturally enter in an emotional resonance with the robotic head, allowing the robot to perform an on- line learning without any explicit supervision (or pre- defined communication format). In a first series of experiments, we thought that the use of an ad hoc mechanism to focus on the face in order to obtain an information invariant to translation and scale could simplify the problem (we have tried numerous algo- rithms more or less biologically plausible based on color detection, Hough or Haar transforms...). A sin- gle neuron (or perceptron) was able to learn offline (using a database of face/non face examples) to re- ject the non face examples if the human was always facing the robot head. As others, we verified the per- formances were limited by the capability of the face detection mechanism to focus always in the same way on the face. Whatever, at the end, it was disappoint- ing to obtain an autonomous learning of the facial expression recognition with a system using a super- vised recognition of the human face! Moreover, it was clear the face recognition alone cannot be learned au- tonomously since we were unable to propose a criteria to bootstrap the autonomous learning of the faces / non face discrimination. Hence, we decided to sup- a) b) c) d) Figure 1: Examples of robot facial expressions: a) sad- ness, b) surprise, c) happiness. d) Example of a typical human / robot interaction game (here the human imitat- ing the robot). press this first step of face detection since it was not necessary in the mathematical model. In order to ob- tain a limited number of local views to be learned and analyzed a Difference of Gaussian (DoG) filter is ap- plied on the gradient image to determine stable focus points associated to angular or curved areas (these points remain stable from small perspective or scale variations). An inhibition of return allows to explore sequentially the image around each focus point. To increase the analysis speed, movement information is used to amplify the activity of the focus points in the moving parts of the image (presumably at the po- sition of the human partner face). Local views are next obtained after a log polar transform of the in- put image centered on each focus point (this increases Berthouze, L., Prince, C. G., Littman, M., Kozima, H., and Balkenius, C. (2007). Proceedings of the Seventh International Conference on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems. Lund University Cognitive Studies, 135.