Citation: Jebur, S.A.; Hussein, K.A.; Hoomod, H.K.; Alzubaidi, L. Novel Deep Feature Fusion Framework for Multi-Scenario Violence Detection. Computers 2023, 12, 175. https:// doi.org/10.3390/computers12090175 Academic Editors: Hussain Mohammed Dipu Kabir, Syed Bahauddin Alam, Subrota Kumar Mondal and Jeremy Straub Received: 8 August 2023 Revised: 26 August 2023 Accepted: 31 August 2023 Published: 5 September 2023 Copyright: © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). computers Article Novel Deep Feature Fusion Framework for Multi-Scenario Violence Detection Sabah Abdulazeez Jebur 1,2 , Khalid A. Hussein 3 , Haider Kadhim Hoomod 3 and Laith Alzubaidi 4,5, * 1 Department of Computer Sciences, University of Technology, Baghdad 00964, Iraq; sabah.abdulazeez@alkadhum-col.edu.iq 2 Department of Computer Techniques Engineering, Imam Al-Kadhum College (IKC), Baghdad 00964, Iraq 3 Department of Computer Science, College of Education, Mustansiriyah University, Baghdad 00964, Iraq; dr.khalid.ali68@gmail.com (K.A.H.); drhjnew@gmail.com (H.K.H.) 4 School of Mechanical, Medical and Process Engineering, Queensland University of Technology, Brisbane, QLD 4000, Australia 5 Centre for Data Science, Queensland University of Technology, Brisbane, QLD 4000, Australia * Correspondence: l.alzubaidi@qut.edu.au Abstract: Detecting violence in various scenarios is a difﬁcult task that requires a high degree of generalisation. This includes ﬁghts in different environments such as schools, streets, and foot- ball stadiums. However, most current research on violence detection focuses on a single scenario, limiting its ability to generalise across multiple scenarios. To tackle this issue, this paper offers a new multi-scenario violence detection framework that operates in two environments: ﬁghting in various locations and rugby stadiums. This framework has three main steps. Firstly, it uses transfer learning by employing three pre-trained models from the ImageNet dataset: Xception, Inception, and InceptionResNet. This approach enhances generalisation and prevents overﬁtting, as these models have already learned valuable features from a large and diverse dataset. Secondly, the framework combines features extracted from the three models through feature fusion, which improves feature representation and enhances performance. Lastly, the concatenation step combines the features of the ﬁrst violence scenario with the second scenario to train a machine learning classiﬁer, enabling the classiﬁer to generalise across both scenarios. This concatenation framework is highly ﬂexible, as it can incorporate multiple violence scenarios without requiring training from scratch with additional scenarios. The Fusion model, which incorporates feature fusion from multiple models, obtained an accuracy of 97.66% on the RLVS dataset and 92.89% on the Hockey dataset. The Concatenation model accomplished an accuracy of 97.64% on the RLVS and 92.41% on the Hockey datasets with just a single classiﬁer. This is the ﬁrst framework that allows for the classiﬁcation of multiple violent scenarios within a single classiﬁer. Furthermore, this framework is not limited to violence detection and can be adapted to different tasks. Keywords: deep learning; feature fusion; transfer learning; violence detection 1. Introduction Surveillance cameras are widely employed in supermarkets, gas stations, streets, roads, cafes, and similar areas. They are commonly used to monitor suspicious activities, known explicitly as anomaly behaviours. These behaviours cover a wide range of actions, such as attacks, harassment, ﬁghts, robberies, and vandalism. Anomaly behaviour refers to actions that deviate from the usual norms within a given context. Regarding computer vision (CV), anomalies are identiﬁed via data patterns showing signiﬁcant deviations from normal data [1]. Regrettably, signiﬁcant amounts of time and money are dedicated to monitor and detect these activities without the support of automated systems [2]. This scenario emphasises the growing necessity for automated systems to comprehend and evaluate these actions. Machine learning (ML) techniques are crucial in providing efﬁcient solutions Computers 2023, 12, 175. https://doi.org/10.3390/computers12090175 https://www.mdpi.com/journal/computers