979-8-3315-4899-5/25/$31.00 ©2025 IEEE 4 th International Conference on Communication, Computing and Digital Systems Conv-LSTM for Real-Time Spatio-Temporal Analysis of Crowd Behavior in Public Spaces Muhammad Junaid Asif 1,2 , Shazia Saqib 3 , Rana Fayyaz Ahmad 1 , Mujtaba Asad 4 , and Syed Tahir Hussain Rizvi 5 1 Artificial Intelligence Technology Centre (AITeC), National Centre for Physics (NCP) Islamabad 44000, Pakistan 2 Faculty of IT and Computer Sciences (FoIT&CS), University of Central Punjab (UCP) Lahore 54000, Pakistan 3 School of Informatics and Robotics, Institute of Arts and Culture (IAC), Lahore 54000, Pakistan 4 Institute of Image Processing and Pattern Recognition, Department of Automation, SJTU, Shanghai 200240, China 5 Department of Electrical Engineering and Computer Sciences, University of Stavanger, Stavanger 4021, Norway junaid.asif@ncp.edu.pk Abstract— With the recent advancement in the field of deep learning and computer vision, crowd scene analysis has become an essential research area for ensuring public safety. United Nations (UN) predicts world population growth of 0.82% by 2035, driving people to cities for better lifestyles and social events like concerts, shopping, political gatherings, and educational conferences. Manual monitoring, however, is often laborious and error-prone, underscoring the importance of automated solutions. In this paper, we propose a spatio- temporal framework based on the combination of Convolutional Neural Networks and Long Short-Term Memory (LSTM) for automatic classification of normal vs abnormal behavior of people present in a crowd. VGG-19 is proposed for extraction of spatial features of two consecutive frames at different time levels “t” and “t-1”. These extracted spatial features are then concatenated by leveraging the Wide Dense Residual block (WDRB). The concatenated features are then fed to LSTM block to capture the temporal features. The proposed method is evaluated on two different data sets: hockey fight data set and Real-Life Violent Situation (RLVS). Evaluation shows that the proposed method provides an efficient and reliable framework for crowd anomaly detection, thereby enhancing public security and safety. The results show that the models achieve accuracy of 91.0%, and 93.5% on hockey fight datasets and RLVS respectively. The implementation code is available at https://github.com/junaid2066/CAD Keywords— Crowd scene analysis, Behavior analysis, Anomaly detection, Conv-LSTM, VGG19, LSTM, Wide Dense Residual block I. INTRODUCTION A large group of people who gather for a particular purpose, such as going to a sporting event, a music concert, or other similar events, are referred to as a crowd. Conversely, an anomaly can be defined as an abnormal entity/behavior that deviates or disturbs the normal behavior within a crowd; frequently standing out as an outlier in the broader distribution. Crowd anomaly detection is the process of identifying and flagging unusual or abnormal behaviors within a crowded scene [1], [2]. It uses cutting-edge technology like computer vision, machine learning, and data analysis to look at how crowds move and find possible dangers or disruptive events. Various types of anomalies (as shown in Fig. 1) [3] can be detected, including the presence of weapons, fights, stampedes, panic situations, violence, looting, mob mentality, trampling, and crush incidents. It plays a vital role in different real-life applications by ensuring public safety and security. It may be utilised to identify safety hazards, spot potential threats, or monitor signs of criminal activity in crowded areas, transit hubs, or places with essential infrastructure. By notifying authorities or security personnel, it also enables us to act quickly, reducing risks and maintaining order. (a) (b) (c) (d) Fig. 1. Illustrative examples of anomalous activities in a pedestrian crowd: (a). a person running among walking individuals, (b). a cyclist navigating through the crowd, (c). an individual using a wheelchair amidst pedestrians, and (d). a vehicle intruding into a pedestrian-only zone. Crowd violence detection technology is very important for keeping people safe, especially at crowded events, protests, and places with a lot of crime. These technologies can help with quick responses and interventions by finding possible violent events early and letting the police or security know, which keeps order and reduces harm. By using this technology, we can improve public safety and security by automatically spotting and keeping an eye on fights, aggressive behaviour, and other forms of violence in real time. Anomaly detection can use different types of data, like videos, audio, and physical patterns. You can use videos to find different patterns, fights, aggressive body language, or weapons. But sounds that are signs of violence, like yelling or screaming, can be picked up by audio. Before deep learning techniques became popular, people often used traditional methods to find crowd anomalies. These methods included optical flow, background subtraction, estimating crowd density, statistical methods, hybrid methods, and trajectory-based methods. But these old-fashioned methods had some problems and limits. One of the biggest 2025 4th International Conference on Communication, Computing and Digital Systems (C-CODE) | 979-8-3315-4899-5/25/$31.00 ©2025 IEEE | DOI: 10.1109/C-CODE67372.2025.11204064 Authorized licensed use limited to: King Saud bin Abdulaziz Univ for Health Sciences. Downloaded on October 30,2025 at 07:04:10 UTC from IEEE Xplore. Restrictions apply.