Understanding human actions and intentions from observation Serge Thill, Paul E. Hemeren and Tom Ziemke 1 I. I NTRODUCTION Robots, in particular humanoid (service) robots, are arti- ficial systems which can reasonably be expected to interact with non-expert humans in their application domains. The onus therefore has to be on the design of the robot to facilitate this interaction (as opposed to human users having to learn how to interact with a robot). To facilitate robots’ un- derstanding of human intentions, actions, etc, robots require at least a rudimentary form of what in psychology is called ”theory of mind” (ToM), i.e. the ability to create a model of of other agents’ mental states. Here, we discuss relevant work on inferring human intentions from observation alone. II. FROM OBSERVING AN AGENT TO A COMPUTATIONAL MODEL OF MENTAL STATES Despite a lot of research activity on human-robot social interaction in recent years, endowing robots with a ToM is relatively unexplored and “internal” states of a human are normally only considered insofar as they refer to the goals of actions (e.g. [1], [2]). Gestures, particularly those involving the arm/hand and the head (e.g. pointing with a finger, nodding with the head), and whole-body movements are important for the ability to detect mental states from observable data are. Studies in this direction explore, for instance, attentional behaviours as non- verbal communicative signals in virtual environments [3] or a range of gestures in dialogue with an embodied robot [4]. Others suggest that humans are able to identify the intentions of others based on motion information (e.g., [5], [6]).Being able to visual perceive intention governed actions allows us to successfully interact with the people around us. Some of our own recent work [7], [8] has shown that human observers consistently segment simple hand and arm actions on the basis of movement kinematics. The general research findings in this area indicate that our ability to understand the actions of others relies importantly on perceiving the moving parts of human bodies. A further aspect of interpersonal interaction concerns the perception of emotions. Basic emotions (e.g., anger, disgust, fear, happiness and sadness) appear also to be conveyed in human movement presented in point-light displays [9]. More simple movements such as knocking and drinking can also reliably convey emotion [10]. There are studies (beyond those trying to infer human intentions without developing a robotic ToM) indicating that data on human intentions and movements from observation 1 The authors are with the Interaction Lab, Informatics Research Cen- tre, University of Sk¨ ovde, 54128 Sk¨ ovde, Sweden {serge.thill; paul.hemeren; tom.ziemke} at his.se alone can be used in the creation of a ToM. Early work has demonstrated that robots can develop precursors to a ToM from observation of others [11]. More recent efforts describe how a ToM obtained from the observation of others (and inference of their intentions) can be used in decision making through the use of Markov Random Fields [12]. III. CONCLUSIONS One obstacle on the road towards a robotic ToM is that even humanoid robots naturally have bodies that are quite different from human bodies. Yet theories of embodied cognition posit that the own body functions as an important reference point for understanding others - however, the re- quired degree of similarity remains unclear. On-body sensors could be used to provide additional information to the robot but are rarely practical. What we show in this contribution is that a substantial body of research indicates that body movement observation alone can go a long way in providing data upon which to build a useful robotic ToM. REFERENCES [1] W. Erlhagen, A. Mukovskiy, F. Chersi, and E. Bicho, “On the development of intention understanding for joint action tasks,” in Proceedings of the 6th IEEE International Conference on Development and Learning. Imperial College London, 2007, pp. 140–145. [2] S. Thill, H. Svensson, and T. Ziemke, “Modeling the development of goal-specificity in mirror neurons,” Cognitive Computation, vol. 3, no. 4, pp. 525–538, 2011. [3] Y. Nakano and T. Nishida, “Attentional behaviours as nonverbal communicative signals in situated interactions with conversational agents,” Conversational Informatics, pp. 85–102, 2007. [4] C. Sidner, C. Lee, and N. Lesh, “The role of dialog in human robot interaction,” in International Workshop on Language Understanding and Agents for Real World Interaction, 2003. [5] J. Decety and J. Gr` ezes, “Neural mechanisms subserving the percep- tion of human actions,” Trends in cognitive sciences, vol. 3, no. 5, p. 172178, 1999. [6] M. Iacoboni, I. Molnar-Szakacs, V. Gallese, G. Buccino, J. Mazziotta, and G. Rizzolatti, “Grasping the intentions of others with ones own mirror neuron system,” PLoS Biology, vol. 3, no. 3, p. e79, 2005. [7] P. E. Hemeren and S. Thill, “Deriving motion primitives through action segmentation,” Frontiers in Psychology, vol. 1, no. 243, 2011. [8] S. Thill, P. E. Hemeren, and B. Dur´ an, “Prediction of human action segmentation based on end-effector kinematics using linear models,” in European Perspectives on Cognitive Science: Proceedings of the Euro- pean Conference on Cognitive Science 2011, B. Kovino, A. Karmiloff- Smith, and N. J. Nersessian, Eds. Sofia: NBU Press, 2011. [9] A. Atkinson, W. Dittrich, A. Gemmell, and A. Young, “Emotion perception from dynamic and static body expressions in point-light and full-light displays,” Perception, vol. 33, p. 717746, 2004. [10] F. Pollick, H. Paterson, A. Bruderlin, and A. Sanford, “Perceiving affect from arm movement,” Cognition, vol. 82, no. 2, p. 851861, 2001. [11] B. Scassellati, “Theory of mind for a humanoid robot,” Autonomous Robots, vol. 12, no. 1, pp. 13–24, 2002. [12] J. Butterfield, O. C. Jenkins, D. M. Sobel, and J. Schwertfeger, “Modeling aspects of theory of mind with markov random fields,” International Journal of Social Robotics, vol. 1, pp. 41–51, 2009.