Citation: Jebur, S.A.; Hussein, K.A.; Hoomod, H.K.; Alzubaidi, L. Novel Deep Feature Fusion Framework for Multi-Scenario Violence Detection. Computers 2023, 12, 175. https:// doi.org/10.3390/computers12090175 Academic Editors: Hussain Mohammed Dipu Kabir, Syed Bahauddin Alam, Subrota Kumar Mondal and Jeremy Straub Received: 8 August 2023 Revised: 26 August 2023 Accepted: 31 August 2023 Published: 5 September 2023 Copyright: © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). computers Article Novel Deep Feature Fusion Framework for Multi-Scenario Violence Detection Sabah Abdulazeez Jebur 1,2 , Khalid A. Hussein 3 , Haider Kadhim Hoomod 3 and Laith Alzubaidi 4,5, * 1 Department of Computer Sciences, University of Technology, Baghdad 00964, Iraq; sabah.abdulazeez@alkadhum-col.edu.iq 2 Department of Computer Techniques Engineering, Imam Al-Kadhum College (IKC), Baghdad 00964, Iraq 3 Department of Computer Science, College of Education, Mustansiriyah University, Baghdad 00964, Iraq; dr.khalid.ali68@gmail.com (K.A.H.); drhjnew@gmail.com (H.K.H.) 4 School of Mechanical, Medical and Process Engineering, Queensland University of Technology, Brisbane, QLD 4000, Australia 5 Centre for Data Science, Queensland University of Technology, Brisbane, QLD 4000, Australia * Correspondence: l.alzubaidi@qut.edu.au Abstract: Detecting violence in various scenarios is a difficult task that requires a high degree of generalisation. This includes fights in different environments such as schools, streets, and foot- ball stadiums. However, most current research on violence detection focuses on a single scenario, limiting its ability to generalise across multiple scenarios. To tackle this issue, this paper offers a new multi-scenario violence detection framework that operates in two environments: fighting in various locations and rugby stadiums. This framework has three main steps. Firstly, it uses transfer learning by employing three pre-trained models from the ImageNet dataset: Xception, Inception, and InceptionResNet. This approach enhances generalisation and prevents overfitting, as these models have already learned valuable features from a large and diverse dataset. Secondly, the framework combines features extracted from the three models through feature fusion, which improves feature representation and enhances performance. Lastly, the concatenation step combines the features of the first violence scenario with the second scenario to train a machine learning classifier, enabling the classifier to generalise across both scenarios. This concatenation framework is highly flexible, as it can incorporate multiple violence scenarios without requiring training from scratch with additional scenarios. The Fusion model, which incorporates feature fusion from multiple models, obtained an accuracy of 97.66% on the RLVS dataset and 92.89% on the Hockey dataset. The Concatenation model accomplished an accuracy of 97.64% on the RLVS and 92.41% on the Hockey datasets with just a single classifier. This is the first framework that allows for the classification of multiple violent scenarios within a single classifier. Furthermore, this framework is not limited to violence detection and can be adapted to different tasks. Keywords: deep learning; feature fusion; transfer learning; violence detection 1. Introduction Surveillance cameras are widely employed in supermarkets, gas stations, streets, roads, cafes, and similar areas. They are commonly used to monitor suspicious activities, known explicitly as anomaly behaviours. These behaviours cover a wide range of actions, such as attacks, harassment, fights, robberies, and vandalism. Anomaly behaviour refers to actions that deviate from the usual norms within a given context. Regarding computer vision (CV), anomalies are identified via data patterns showing significant deviations from normal data [1]. Regrettably, significant amounts of time and money are dedicated to monitor and detect these activities without the support of automated systems [2]. This scenario emphasises the growing necessity for automated systems to comprehend and evaluate these actions. Machine learning (ML) techniques are crucial in providing efficient solutions Computers 2023, 12, 175. https://doi.org/10.3390/computers12090175 https://www.mdpi.com/journal/computers