979-8-3315-4899-5/25/$31.00 ©2025 IEEE
4
th
International Conference on Communication, Computing and Digital Systems
Conv-LSTM for Real-Time Spatio-Temporal Analysis of
Crowd Behavior in Public Spaces
Muhammad Junaid Asif
1,2
, Shazia Saqib
3
, Rana Fayyaz Ahmad
1
, Mujtaba Asad
4
, and Syed Tahir Hussain Rizvi
5
1
Artificial Intelligence Technology Centre (AITeC), National Centre for Physics (NCP) Islamabad 44000, Pakistan
2
Faculty of IT and Computer Sciences (FoIT&CS), University of Central Punjab (UCP) Lahore 54000, Pakistan
3
School of Informatics and Robotics, Institute of Arts and Culture (IAC), Lahore 54000, Pakistan
4
Institute of Image Processing and Pattern Recognition, Department of Automation, SJTU, Shanghai 200240, China
5
Department of Electrical Engineering and Computer Sciences, University of Stavanger, Stavanger 4021, Norway
junaid.asif@ncp.edu.pk
Abstract— With the recent advancement in the field of deep
learning and computer vision, crowd scene analysis has become
an essential research area for ensuring public safety. United
Nations (UN) predicts world population growth of 0.82% by
2035, driving people to cities for better lifestyles and social
events like concerts, shopping, political gatherings, and
educational conferences. Manual monitoring, however, is often
laborious and error-prone, underscoring the importance of
automated solutions. In this paper, we propose a spatio-
temporal framework based on the combination of
Convolutional Neural Networks and Long Short-Term Memory
(LSTM) for automatic classification of normal vs abnormal
behavior of people present in a crowd. VGG-19 is proposed for
extraction of spatial features of two consecutive frames at
different time levels “t” and “t-1”. These extracted spatial
features are then concatenated by leveraging the Wide Dense
Residual block (WDRB). The concatenated features are then fed
to LSTM block to capture the temporal features. The proposed
method is evaluated on two different data sets: hockey fight data
set and Real-Life Violent Situation (RLVS). Evaluation shows
that the proposed method provides an efficient and reliable
framework for crowd anomaly detection, thereby enhancing
public security and safety. The results show that the models
achieve accuracy of 91.0%, and 93.5% on hockey fight datasets
and RLVS respectively. The implementation code is available at
https://github.com/junaid2066/CAD
Keywords— Crowd scene analysis, Behavior analysis,
Anomaly detection, Conv-LSTM, VGG19, LSTM, Wide Dense
Residual block
I. INTRODUCTION
A large group of people who gather for a particular
purpose, such as going to a sporting event, a music concert,
or other similar events, are referred to as a crowd.
Conversely, an anomaly can be defined as an abnormal
entity/behavior that deviates or disturbs the normal behavior
within a crowd; frequently standing out as an outlier in the
broader distribution. Crowd anomaly detection is the process
of identifying and flagging unusual or abnormal behaviors
within a crowded scene [1], [2]. It uses cutting-edge
technology like computer vision, machine learning, and data
analysis to look at how crowds move and find possible
dangers or disruptive events. Various types of anomalies (as
shown in Fig. 1) [3] can be detected, including the presence
of weapons, fights, stampedes, panic situations, violence,
looting, mob mentality, trampling, and crush incidents. It
plays a vital role in different real-life applications by ensuring
public safety and security.
It may be utilised to identify safety hazards, spot potential
threats, or monitor signs of criminal activity in crowded
areas, transit hubs, or places with essential infrastructure. By
notifying authorities or security personnel, it also enables us
to act quickly, reducing risks and maintaining order.
(a) (b)
(c) (d)
Fig. 1. Illustrative examples of anomalous activities in a
pedestrian crowd: (a). a person running among walking
individuals, (b). a cyclist navigating through the crowd, (c). an
individual using a wheelchair amidst pedestrians, and (d). a
vehicle intruding into a pedestrian-only zone.
Crowd violence detection technology is very important for
keeping people safe, especially at crowded events, protests,
and places with a lot of crime. These technologies can help
with quick responses and interventions by finding possible
violent events early and letting the police or security know,
which keeps order and reduces harm. By using this
technology, we can improve public safety and security by
automatically spotting and keeping an eye on fights,
aggressive behaviour, and other forms of violence in real time.
Anomaly detection can use different types of data, like videos,
audio, and physical patterns. You can use videos to find
different patterns, fights, aggressive body language, or
weapons. But sounds that are signs of violence, like yelling or
screaming, can be picked up by audio.
Before deep learning techniques became popular, people
often used traditional methods to find crowd anomalies. These
methods included optical flow, background subtraction,
estimating crowd density, statistical methods, hybrid methods,
and trajectory-based methods. But these old-fashioned
methods had some problems and limits. One of the biggest
2025 4th International Conference on Communication, Computing and Digital Systems (C-CODE) | 979-8-3315-4899-5/25/$31.00 ©2025 IEEE | DOI: 10.1109/C-CODE67372.2025.11204064
Authorized licensed use limited to: King Saud bin Abdulaziz Univ for Health Sciences. Downloaded on October 30,2025 at 07:04:10 UTC from IEEE Xplore. Restrictions apply.