  Citation: Riaz, W.; Chenqiang, G.; Azeem, A.; Saifullah; Bux, J.A.; Ullah, A. Traffic Anomaly Prediction System Using Predictive Network. Remote Sens. 2022, 14, 447. https:// doi.org/10.3390/rs14030447 Academic Editor: Lefei Zhang Received: 16 December 2021 Accepted: 13 January 2022 Published: 18 January 2022 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affil- iations. Copyright: © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). remote sensing Article Traffic Anomaly Prediction System Using Predictive Network Waqar Riaz 1, * , Gao Chenqiang 1 , Abdullah Azeem 2 , Saifullah 1 , Jamshaid Allah Bux 3 and Asif Ullah 4 1 School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China; gaocq@cqupt.edu.cn (G.C.); saif07.786@gmail.com (S.) 2 School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400030, China; abdullahazeem06@outlook.com 3 Department of Computer Science, Indus University, Karachi 75300, Pakistan; jsoomro@hec.gov.pk 4 Institute of Control Science and Engineering, Zhejiang University, Hangzhou 321001, China; asifkh@zju.edu.cn * Correspondence: l201810020@stu.cqupt.edu.cn Abstract: Anomaly anticipation in traffic scenarios is one of the primary challenges in action recogni- tion. It is believed that greater accuracy can be obtained by the use of semantic details and motion information along with the input frames. Most state-of-the art models extract semantic details and pre-defined optical flow from RGB frames and combine them using deep neural networks. Many previous models failed to extract motion information from pre-processed optical flow. Our study shows that optical flow provides better detection of objects in video streaming, which is an essential feature in further accident prediction. Additional to this issue, we propose a model that utilizes the recurrent neural network which instantaneously propagates predictive coding errors across layers and time steps. By assessing over time the representations from the pre-trained action recognition model from a given video, the use of pre-processed optical flows as input is redundant. Based on the final predictive score, we show the effectiveness of our proposed model on three different types of anomaly classes as Speeding Vehicle, Vehicle Accident, and Close Merging Vehicle from the state-of-the-art KITTI, D2City and HTA datasets. Keywords: anomaly anticipation; optical flow; feature extraction; Predictive Network 1. Introduction Anything that is radically different from normal behavior may be considered as anomalous, such as appearance of cars on footpaths, an abrupt dispersal of people in a crowd, a person unexpectedly slipping when walking, careless driving, or bypassing signals at a traffic junction. The availability of public video datasets significantly improved the research outcomes for video processing and anomaly detection [1]. Anomaly detection systems are usually trained by learning the expected behavior of the traffic environments. Anomalies are typically categorized as point anomalies [2], contextual anomalies [3], and collective anomalies [4]. Development towards driverless vehicles has drawn increasing attention and made significant progress in the past last decade [5,6]. While this advancement provides conve- nience to people and addresses the emerging needs from industry, it also raises concerns with traffic accidents. As a result, there is a need for further advances towards accident prediction using the time and frame components of video clips. Given this objective, our work seeks to demonstrate the power of PredNet (Predictive Network) [7] for accident anticipation in HTA (Highway Traffic Anomaly), KITTI (Karlsruhe Institute of Technol- ogy and Toyota Technological Institute) and D2city (Didi Dashcam City) [810] datasets. Specifically, these datasets consist of dashcam videos captured from vehicles driving in several traffic scenarios. Videos contained in datasets show that not only is the camera moving, but other vehicles and background features are also varying. The datasets consist Remote Sens. 2022, 14, 447. https://doi.org/10.3390/rs14030447 https://www.mdpi.com/journal/remotesensing