Citation: Jebur, S.A.; Hussein, K.A.;
Hoomod, H.K.; Alzubaidi, L. Novel
Deep Feature Fusion Framework for
Multi-Scenario Violence Detection.
Computers 2023, 12, 175. https://
doi.org/10.3390/computers12090175
Academic Editors: Hussain
Mohammed Dipu Kabir, Syed
Bahauddin Alam, Subrota Kumar
Mondal and Jeremy Straub
Received: 8 August 2023
Revised: 26 August 2023
Accepted: 31 August 2023
Published: 5 September 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
computers
Article
Novel Deep Feature Fusion Framework for Multi-Scenario
Violence Detection
Sabah Abdulazeez Jebur
1,2
, Khalid A. Hussein
3
, Haider Kadhim Hoomod
3
and Laith Alzubaidi
4,5,
*
1
Department of Computer Sciences, University of Technology, Baghdad 00964, Iraq;
sabah.abdulazeez@alkadhum-col.edu.iq
2
Department of Computer Techniques Engineering, Imam Al-Kadhum College (IKC), Baghdad 00964, Iraq
3
Department of Computer Science, College of Education, Mustansiriyah University, Baghdad 00964, Iraq;
dr.khalid.ali68@gmail.com (K.A.H.); drhjnew@gmail.com (H.K.H.)
4
School of Mechanical, Medical and Process Engineering, Queensland University of Technology,
Brisbane, QLD 4000, Australia
5
Centre for Data Science, Queensland University of Technology, Brisbane, QLD 4000, Australia
* Correspondence: l.alzubaidi@qut.edu.au
Abstract: Detecting violence in various scenarios is a difficult task that requires a high degree of
generalisation. This includes fights in different environments such as schools, streets, and foot-
ball stadiums. However, most current research on violence detection focuses on a single scenario,
limiting its ability to generalise across multiple scenarios. To tackle this issue, this paper offers a
new multi-scenario violence detection framework that operates in two environments: fighting in
various locations and rugby stadiums. This framework has three main steps. Firstly, it uses transfer
learning by employing three pre-trained models from the ImageNet dataset: Xception, Inception, and
InceptionResNet. This approach enhances generalisation and prevents overfitting, as these models
have already learned valuable features from a large and diverse dataset. Secondly, the framework
combines features extracted from the three models through feature fusion, which improves feature
representation and enhances performance. Lastly, the concatenation step combines the features of the
first violence scenario with the second scenario to train a machine learning classifier, enabling the
classifier to generalise across both scenarios. This concatenation framework is highly flexible, as it
can incorporate multiple violence scenarios without requiring training from scratch with additional
scenarios. The Fusion model, which incorporates feature fusion from multiple models, obtained
an accuracy of 97.66% on the RLVS dataset and 92.89% on the Hockey dataset. The Concatenation
model accomplished an accuracy of 97.64% on the RLVS and 92.41% on the Hockey datasets with
just a single classifier. This is the first framework that allows for the classification of multiple violent
scenarios within a single classifier. Furthermore, this framework is not limited to violence detection
and can be adapted to different tasks.
Keywords: deep learning; feature fusion; transfer learning; violence detection
1. Introduction
Surveillance cameras are widely employed in supermarkets, gas stations, streets, roads,
cafes, and similar areas. They are commonly used to monitor suspicious activities, known
explicitly as anomaly behaviours. These behaviours cover a wide range of actions, such as
attacks, harassment, fights, robberies, and vandalism. Anomaly behaviour refers to actions
that deviate from the usual norms within a given context. Regarding computer vision (CV),
anomalies are identified via data patterns showing significant deviations from normal
data [1]. Regrettably, significant amounts of time and money are dedicated to monitor
and detect these activities without the support of automated systems [2]. This scenario
emphasises the growing necessity for automated systems to comprehend and evaluate
these actions. Machine learning (ML) techniques are crucial in providing efficient solutions
Computers 2023, 12, 175. https://doi.org/10.3390/computers12090175 https://www.mdpi.com/journal/computers