Pattern Recognition 110 (2021) 107631
Contents lists available at ScienceDirect
Pattern Recognition
journal homepage: www.elsevier.com/locate/patcog
Sparse motion fields for trajectory prediction
Catarina Barata
a,∗
, Jacinto C. Nascimento
a
, João M. Lemos
b
, Jorge S. Marques
a
a
Institute for Systems and Robotics, Instituto Superio Técnico, Universidade de Lisboa, Portugal
b
INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal
a r t i c l e i n f o
Article history:
Received 22 January 2020
Revised 9 August 2020
Accepted 6 September 2020
Available online 10 September
Keywords:
Human motion analysis
Trajectory prediction
Sparse motion fields
a b s t r a c t
Trajectory prediction is a crucial element of many automated tasks, such as autonomous navigation or
video surveillance. To automatically predict the motion of an agent (e.g., pedestrian or car), the model
needs to efficiently represent human motion and “understand” the external stimuli that may influence
human behavior. In this work we propose a methodology to model the motion of agents in a video
scene. Our method is based on space-varying sparse motion fields, which simultaneously characterize
diverse motion patterns in the scene and implicitly learn contextual cues about the static environment,
namely obstacles and semantic constraints. The sparse motion fields are applied to the task of long-term
trajectory prediction using a probabilistic generative approach. Several benchmark data sets are used to
demonstrate the potential of the proposed approach and show that our method achieves competitive
state-of-the-art performances.
© 2020 Elsevier Ltd. All rights reserved.
1. Introduction
1.1. Motivation
The ability to describe and interpret the behavior of various
agents in a scene is a key factor towards its understanding. This
is a requirement in areas such as video surveillance, sports analy-
sis, and robotic or autonomous cars navigation, where the provided
information may be used to address several tasks (e.g., tracking,
activity recognition, and detection of abnormal behaviors) [1,2]. All
of the aforementioned tasks rely on the assessment of the motion
performed by the agent. Trajectory data, i.e. the set of consecutive
2D positions of an agent, are known to provide relevant cues to
understand the human motion behavior. Thus, it has been adopted
by several works in the literature, in particular those devoted to
short and long-term path prediction.
Human motion is governed by a variety of factors, namely
agent-specific cues (e.g., intended destination or preferred veloc-
ity) and environment characteristics [4]. The latter can be divided
into: i) dynamic environment, which accounts for the interactions
with other agents (e.g., neighbor pedestrians or cars) [5–7]; and ii)
static environment, which characterizes the several physical con-
straints of the scene, i.e., its semantic (e.g., buildings, roads, and
sidewalks) and/or individual obstacles [8,9]. The majority of re-
cent approaches puts a significant emphasis on the characteriza-
∗
Corresponding author.
E-mail address: ana.c.fidalgo.barata@tecnico.ulisboa.pt (C. Barata).
tion of the dynamic environment, with the adoption of method-
ologies based on neural networks [7,10,11]. Despite the undeniable
importance of the dynamic environment in very crowded scenes,
where interactions such as avoiding collisions are prone to occur,
the relevance of the static environment should not be disregarded.
For once, motion models solely based on dynamic cues have been
shown to underperform when the static environment strongly in-
fluences the trajectories [11,12]. In this case, the motion models are
able to capture the influence of the surrounding agents. However,
they do not have any information regarding the semantic of the
scene (e.g., walkable and forbidden regions) nor about the presence
of static obstacles. When applied to the task of trajectory predic-
tion, such models can generate unrealistic trajectories that do not
comply with the physical constraints of the scene. Methods based
on agent interactions are also unsuitable to deal with scenes where
the agent’s density is low, since the motion will be mostly guided
by the static environment.
Recently, a few works demonstrated that the static environment
may play a very relevant role (e.g., [13–15]). However, these meth-
ods are unable to learn the physical structure of a scene with-
out using additional information, such as semantic maps or image
features extracted from video frames. In this work we argue that
such data is not required, since the movement of the agents in
a scene already conveys information about the static environment
(e.g., pedestrians will tend to move on sidewalks, cars will not en-
ter buildings, and obstacles will be avoided). We assume that the
physical properties of a scene can be learned in an unsupervised
way, directly from trajectory data. To achieve this goal, we pro-
https://doi.org/10.1016/j.patcog.2020.107631
0031-3203/© 2020 Elsevier Ltd. All rights reserved.