Citation: Riaz, W.; Chenqiang, G.;
Azeem, A.; Saifullah; Bux, J.A.; Ullah,
A. Traffic Anomaly Prediction System
Using Predictive Network. Remote
Sens. 2022, 14, 447. https://
doi.org/10.3390/rs14030447
Academic Editor: Lefei Zhang
Received: 16 December 2021
Accepted: 13 January 2022
Published: 18 January 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
remote sensing
Article
Traffic Anomaly Prediction System Using Predictive Network
Waqar Riaz
1,
* , Gao Chenqiang
1
, Abdullah Azeem
2
, Saifullah
1
, Jamshaid Allah Bux
3
and Asif Ullah
4
1
School of Communication and Information Engineering, Chongqing University of Posts and
Telecommunications, Chongqing 400065, China; gaocq@cqupt.edu.cn (G.C.); saif07.786@gmail.com (S.)
2
School of Microelectronics and Communication Engineering, Chongqing University,
Chongqing 400030, China; abdullahazeem06@outlook.com
3
Department of Computer Science, Indus University, Karachi 75300, Pakistan; jsoomro@hec.gov.pk
4
Institute of Control Science and Engineering, Zhejiang University, Hangzhou 321001, China;
asifkh@zju.edu.cn
* Correspondence: l201810020@stu.cqupt.edu.cn
Abstract: Anomaly anticipation in traffic scenarios is one of the primary challenges in action recogni-
tion. It is believed that greater accuracy can be obtained by the use of semantic details and motion
information along with the input frames. Most state-of-the art models extract semantic details and
pre-defined optical flow from RGB frames and combine them using deep neural networks. Many
previous models failed to extract motion information from pre-processed optical flow. Our study
shows that optical flow provides better detection of objects in video streaming, which is an essential
feature in further accident prediction. Additional to this issue, we propose a model that utilizes the
recurrent neural network which instantaneously propagates predictive coding errors across layers
and time steps. By assessing over time the representations from the pre-trained action recognition
model from a given video, the use of pre-processed optical flows as input is redundant. Based
on the final predictive score, we show the effectiveness of our proposed model on three different
types of anomaly classes as Speeding Vehicle, Vehicle Accident, and Close Merging Vehicle from the
state-of-the-art KITTI, D2City and HTA datasets.
Keywords: anomaly anticipation; optical flow; feature extraction; Predictive Network
1. Introduction
Anything that is radically different from normal behavior may be considered as
anomalous, such as appearance of cars on footpaths, an abrupt dispersal of people in
a crowd, a person unexpectedly slipping when walking, careless driving, or bypassing
signals at a traffic junction. The availability of public video datasets significantly improved
the research outcomes for video processing and anomaly detection [1]. Anomaly detection
systems are usually trained by learning the expected behavior of the traffic environments.
Anomalies are typically categorized as point anomalies [2], contextual anomalies [3], and
collective anomalies [4].
Development towards driverless vehicles has drawn increasing attention and made
significant progress in the past last decade [5,6]. While this advancement provides conve-
nience to people and addresses the emerging needs from industry, it also raises concerns
with traffic accidents. As a result, there is a need for further advances towards accident
prediction using the time and frame components of video clips. Given this objective, our
work seeks to demonstrate the power of PredNet (Predictive Network) [7] for accident
anticipation in HTA (Highway Traffic Anomaly), KITTI (Karlsruhe Institute of Technol-
ogy and Toyota Technological Institute) and D2city (Didi Dashcam City) [8–10] datasets.
Specifically, these datasets consist of dashcam videos captured from vehicles driving in
several traffic scenarios. Videos contained in datasets show that not only is the camera
moving, but other vehicles and background features are also varying. The datasets consist
Remote Sens. 2022, 14, 447. https://doi.org/10.3390/rs14030447 https://www.mdpi.com/journal/remotesensing