Journal of Intelligent & Robotic Systems
https://doi.org/10.1007/s10846-018-0815-7
Predicting Human Actions Taking into Account Object Affordances
Vibekananda Dutta
1
· Teresa Zielinska
1
Received: 18 November 2017 / Accepted: 14 March 2018
© The Author(s) 2018
Abstract
Anticipating human intentional actions is essential for many applications involving service robots and social robots.
Nowadays assisting robots must do reasoning beyond the present with predicting future actions. It is difficult due to its non-
Markovian property and the rich contextual information. This task requires the subtle details inherent in human movements
that may imply a future action. This paper presents a probabilistic method for action prediction in human-object interactions.
The key idea of our approach is the description of the so-called object affordance, the concept which allows us to deliver
a trajectory visualizing a possible future action. Extensive experiments were conducted to show the effectiveness of our
method in action prediction. For evaluation we applied a new RGB-D activity video dataset recorded by the Sez3D depth
sensors. The dataset contains several human activities composed out of different actions.
Keywords Intention recognition · Human-object relation · Object affordance · Action prediction · Feature extraction ·
Probability distribution
1 Introduction
In everyday life a human performs various actions. Being
able to detect and anticipate which action is going to
be performed in a complex environment is important for
assistive robots, social robots and healthcare assistants.
Such ability requires reasoning tools and methods.
With such capability [20], a robot is able to plan ahead
with reactive responses together with avoiding potential
accidents. When a partial observation is available, we
should be able to predict what is going to happen next (e.g.,
a person is about to open the door as shown in the Fig. 1).
Predictive models are also useful in detecting abnormal
actions in surveillance videos with alerting emergency
responders [38]. It is necessary that a reliable prediction is
done at the early stage of an action, e.g., when only 60% of
a whole action was observed.
The preliminary version of the paper presented during “Intentional
workshop on Robot Motion Control (RoMoCo)”, 2017, Poland.
Vibekananda Dutta
vibek@meil.pw.edu.pl
Teresa Zielinska
teresaz@meil.pw.edu.pl
1
Institute of Aeronautics and Applied Mechanics,
Warsaw University of Technology, ul. Nowowiejska 24,
00-665 Warsaw, Poland
Recent research focuses on actions recognition prob-
lem [16, 24, 32]. Although few recent works addressed the
problem of ongoing activity recognition with partial infor-
mation avilable [31, 36], they do not answer how to perform
activity prediction. A reliable action prediction relies on
selecting and processing the crucial information, e.g., scene
context, object properties (affordance, object texture) and
relative human-object posture. The action prediction has
two features:
– anticipating human actions requires identifying the
subtle details inherent in human movements that would
lead to a future action,
– the action prediction problem must be carried out with
the focusing on temporal human interactions with the
environment (e.g., interaction with the objects or with
the other people).
In this work, we discuss the problem of action prediction
in natural scenarios using collection of examples of human
actions in the real world sampled by video records (WUT-
ZTMiR
1
dataset, CAD-60
2
dataset). We investigate how the
user behaviors evolves dynamically in a short time. Our goal
1
Warsaw University of Technology, Division of Theory of Machines and
Robots. https://ztmir.meil.pw.edu.pl/web/eng/Pracownicy/Vibekananda-
Dutta-M.Sc
2
Cornell Acivity Dataset. http://pr.cs.cornell.edu/humanactivities/data.
php
(2019) 93:745–761
/ Published online: 4 April 2018