Journal of Intelligent & Robotic Systems https://doi.org/10.1007/s10846-018-0815-7 Predicting Human Actions Taking into Account Object Affordances Vibekananda Dutta 1 · Teresa Zielinska 1 Received: 18 November 2017 / Accepted: 14 March 2018 © The Author(s) 2018 Abstract Anticipating human intentional actions is essential for many applications involving service robots and social robots. Nowadays assisting robots must do reasoning beyond the present with predicting future actions. It is difficult due to its non- Markovian property and the rich contextual information. This task requires the subtle details inherent in human movements that may imply a future action. This paper presents a probabilistic method for action prediction in human-object interactions. The key idea of our approach is the description of the so-called object affordance, the concept which allows us to deliver a trajectory visualizing a possible future action. Extensive experiments were conducted to show the effectiveness of our method in action prediction. For evaluation we applied a new RGB-D activity video dataset recorded by the Sez3D depth sensors. The dataset contains several human activities composed out of different actions. Keywords Intention recognition · Human-object relation · Object affordance · Action prediction · Feature extraction · Probability distribution 1 Introduction In everyday life a human performs various actions. Being able to detect and anticipate which action is going to be performed in a complex environment is important for assistive robots, social robots and healthcare assistants. Such ability requires reasoning tools and methods. With such capability [20], a robot is able to plan ahead with reactive responses together with avoiding potential accidents. When a partial observation is available, we should be able to predict what is going to happen next (e.g., a person is about to open the door as shown in the Fig. 1). Predictive models are also useful in detecting abnormal actions in surveillance videos with alerting emergency responders [38]. It is necessary that a reliable prediction is done at the early stage of an action, e.g., when only 60% of a whole action was observed. The preliminary version of the paper presented during “Intentional workshop on Robot Motion Control (RoMoCo)”, 2017, Poland. Vibekananda Dutta vibek@meil.pw.edu.pl Teresa Zielinska teresaz@meil.pw.edu.pl 1 Institute of Aeronautics and Applied Mechanics, Warsaw University of Technology, ul. Nowowiejska 24, 00-665 Warsaw, Poland Recent research focuses on actions recognition prob- lem [16, 24, 32]. Although few recent works addressed the problem of ongoing activity recognition with partial infor- mation avilable [31, 36], they do not answer how to perform activity prediction. A reliable action prediction relies on selecting and processing the crucial information, e.g., scene context, object properties (affordance, object texture) and relative human-object posture. The action prediction has two features: anticipating human actions requires identifying the subtle details inherent in human movements that would lead to a future action, the action prediction problem must be carried out with the focusing on temporal human interactions with the environment (e.g., interaction with the objects or with the other people). In this work, we discuss the problem of action prediction in natural scenarios using collection of examples of human actions in the real world sampled by video records (WUT- ZTMiR 1 dataset, CAD-60 2 dataset). We investigate how the user behaviors evolves dynamically in a short time. Our goal 1 Warsaw University of Technology, Division of Theory of Machines and Robots. https://ztmir.meil.pw.edu.pl/web/eng/Pracownicy/Vibekananda- Dutta-M.Sc 2 Cornell Acivity Dataset. http://pr.cs.cornell.edu/humanactivities/data. php (2019) 93:745–761 / Published online: 4 April 2018