A Recurrent Attention and Interaction Model for Pedestrian Trajectory Prediction Xuesong Li, Yating Liu, Kunfeng Wang, Senior Member, IEEE, and Fei-Yue Wang, Fellow, IEEE Abstract—The movement of pedestrians involves temporal continuity, spatial interactivity, and random diversity. As a result, pedestrian trajectory prediction is rather challenging. Most existing trajectory prediction methods tend to focus on just one aspect of these challenges, ignoring the temporal information of the trajectory and making too many assumptions. In this paper, we propose a recurrent attention and interaction (RAI) model to predict pedestrian trajectories. The RAI model consists of a temporal attention module, spatial pooling module, and randomness modeling module. The temporal attention module is proposed to assign different weights to the input sequence of a target, and reduce the speed deviation of different pedestrians. The spatial pooling module is proposed to model not only the social information of neighbors in historical frames, but also the intention of neighbors in the current time. The randomness modeling module is proposed to model the uncertainty and diversity of trajectories by introducing random noise. We conduct extensive experiments on several public datasets. The results demonstrate that our method outperforms many that are state-of- the-art. Index Terms—Deep learning, long short-term memory (LSTM), recurrent attention and interaction (RAI) model, trajectory prediction. I. Introduction P EDESTRIAN trajectory prediction is defined as predictions of the future trajectory of a pedestrian based on his/her position in the past period of time, which is usually treated as a sequence generation task. Pedestrian trajectory prediction is crucial for path planning of autonomous devices [1], [2]. For example, in an autonomous driving scenario, an autonomous vehicle needs to accurately predict the motion trajectories of pedestrians according to their positions, in order to make the next decision. Moreover, the behaviors of pedestrians are modeled through research on pedestrian trajectory prediction, which can be used for crowd evacuation [3], abnormal target detection [4] and other specific tasks. The predicted object can also be a vehicle, animal, and other targets, but most research has been performed with pedestrians. Perhaps this is because the prediction of pedestrian trajectory is more difficult and has more uses. Therefore, it is necessary to conduct in-depth research on pedestrian trajectory prediction. Pedestrian trajectory prediction has attracted much attention. Many researchers have proposed methods for it. Although pedestrian movement is full of randomness, it has a certain regularity. Generally, pedestrian trajectory prediction methods mainly include model-driven methods and data- driven methods. The model-driven methods usually predict external behavior according to underlying principles, while data-driven methods mainly model internal correlation through statistical analysis of data. In early research on pedestrian trajectory prediction, lots of works focused on model-driven methods, which typically include the social force model [5] and hidden Markov model [6]. The social force method predicts the behavior of pedestrians according to attraction and repulsion forces. It is believed that attraction forces will attract specific pedestrians to walk towards a target, and repulsion forces will prevent collision between pedestrians. The hidden Markov method predicts the trajectory of pedestrians in spatio-temporal probability. Nevertheless, these methods are too sensitive to parameters and are unable to describe the diverse social behavior of pedestrians, e.g., that people walk in group. Even worse, the representation ability of these methods is not strong. In recent years, with the development of deep learning, data-driven methods have become a research hotspot. Such approach usually treats pedestrian trajectory prediction as a time-series prediction problem which takes into account the interaction of pedestrians. Some recent works have used the recursive neural networks (RNNs) to solve this problem, such as social long short-term memory (S-LSTM) [7] and group LSTM [8]. The S-LSTM model presents a social pooling module that meshes space to capture the interactive information of adjacent pedestrians. The group LSTM is an improved method of S-LSTM; it uses motion coherence to cluster trajectories with similar movement trends and group pedestrians. But S-LSTM and group LSTM are not sufficient to assume that the effect on an individual is determined by its Manuscript received December 10, 2019; revised February 24, 2020; accepted April 13, 2020. This work was supported by the National Natural Science Foundation of China (U1811463) and the Fundamental Research Funds for the Central Universities (12060093192). Recommended by Associate Editor Qinglai Wei. (Corresponding author: Kunfeng Wang.) Citation: X. S. Li, Y. T. Liu, K. F. Wang, and F.-Y. Wang, “A recurrent attention and interaction model for pedestrian trajectory prediction,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 5, pp. 1361–1370, Sept. 2020. X. S. Li and Y. T. Liu are with the State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, and also with the University of Chinese Academy of Sciences, Beijing 100049, China (e-mail: lixuesong2017@ ia.ac.cn; liuyating2015@ia.ac.cn). K. F. Wang is with the College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China (e-mail: wangkf@mail.buct.edu.cn). F.-Y. Wang is with the State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China (e-mail: feiyue.wang@ia.ac.cn). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JAS.2020.1003300 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 7, NO. 5, SEPTEMBER 2020 1361