IoT Network Intrusion Detection with Ensemble Learners Sulyman Age Abdulkareem * , Chuan Heng Foh * , Haeyoung Lee * , Franc ¸ois Carrez * , Klaus Moessner *† * 5GIC & 6GIC, Institute for Communication Systems (ICS), University of Surrey, Guildford, Surrey, U.K. † Faculty of Electronics and Information Technology, Technical University Chemnitz, Germany. Email:{s.abdulkareem, c.foh, Haeyoung.Lee f.carrez, k.moessner}@surrey.ac.uk, klaus.moessner@etit.tu-chemnitz.de Abstract—Protecting information systems against intruders’ attacks requires utilising intrusion detection systems. Over the past several years, many open-source intrusion datasets have been made available so that academics and researchers can analyse and assess various detection classiﬁers’ effectiveness. These datasets are made available with a full complement of illustrative network features. In this research, we investigate the issue of Network Intrusion Detection (NID) by utilising an Internet of Things (IoT) dataset called Bot-IoT to evaluate the detection efﬁciency and effectiveness of ﬁve different Ensemble Learning Classiﬁers (ELCs). Our experiment’s results showed that despite all ELCs recording high classiﬁcation metric scores, CatBoost emerged as the ELC that performed the best in our experiment in terms of Accuracy, Precision, F1-Score, Training and Test Time. Index Terms—Network Intrusion Detection, Machine Learn- ing, Ensemble Learning Classiﬁers, CatBoost, IoT. I. I NTRODUCTION Our everyday lives are becoming increasingly intertwined with vast amounts of data thanks to the fast expansion of information technology. Cisco has projected that IP trafﬁc is expected to expand from 120 exabytes per month in 2017 to 400 exabytes per month in 2022 [1]. Increased network trafﬁc has led to a growth in the amount of cyberattack- related risks, which have become more diverse. The word “cyberattack” is often used to describe an uninvited attempt to threaten, disable, damage, steal, or otherwise compromise another party’s information assets. Many businesses now rely on network intrusion detection systems (NIDS) to keep their networks safe. Security measures such as ﬁrewalls, virus pro- tection, data encryption, and user authentication are essential but not sufﬁcient to protect computers and networks from today’s threats. In the face of these issues, intrusion detection systems (IDS), a Machine Learning (ML) based method, and the aforementioned security measures can work together [2]. An IDS monitors and analyses network trafﬁc in real-time to detect latent data anomalies. IDSs may be divided into two categories based on their detection philosophy [3]. The ﬁrst one is Signature-based intrusion detection which uses predeﬁned attack signatures to characterise intrusion attempts on the network. As such, this approach cannot detect new attacks [4]. On the other hand, anomaly-based detection may uncover previously undisclosed attacks by analysing network data for anomalies using machine learning algorithms. An anomaly is an incident or behaviour that is out of the ordinary. Many studies [5]–[8] have focused on enhancing the accuracy and efﬁciency of IDSs. Anomaly-based IDS has been widely implemented and is now the primary focus of IDS research due to its promising efﬁcacy. In recent years, machine learning algorithms such as De- cision Tree, Random Forest, Support Vector Machine (SVM), and Neural Networks have been applied to intrusion detection. However, each algorithm has its advantages and downsides. Classiﬁers that work well to detect a particular attack may not work well for another. According to several previous re- search publications [9]–[11], there are still certain drawbacks, no matter the pre-processing or feature selection methods used alongside the classiﬁers. The ML architecture for IDS continuously grows into increasingly complex classiﬁers to increase its efﬁcacy. Ensemble Learning Classiﬁers (ELCs) for classiﬁcation are an example of improving the coherence and competence of detecting intrusions. The approach has become more popular than the use of single classiﬁers. To compensate for the deﬁciencies of the weak classiﬁers, it combines them to build a powerful learner. The ELCs are better options over single classiﬁers since they produce better performance. This paper evaluates the performance of ﬁve Ensemble Learning Classiﬁers on the Bot-IoT dataset [12], an IoT dataset to test for multiclass classiﬁcation performance. CatBoost, Random Forest, LightGBM, and XGBoost are the ELCs employed in this investigation. The two primary focuses of this study are detection effectiveness and speed. Smart homes, smart cities, and smart transportation systems are among the many applications for the Internet of Things (IoT) we looked at in our research. However, IoT security protection is still lacking compared to traditional network applications. We test the detection abilities of the ﬁve ELCs on the IoT dataset’s network categories [12]. Our experiment ﬁndings indicate that CatBoost is able to give the best classiﬁcation result compared to other ELCs. CatBoost also trained and tested the dataset in the shortest time, making it a superior ELC over the other ELCs in our study. The remainder of this paper is organised as follows. Section II overviews some of the most recent NID ensemble ML clas- siﬁer experiments. Section III introduces our experiment IoT dataset, Ensemble Learning Classiﬁers, as well as our research strategy and experiment design. Comparing the performance outcomes among ELCs is the focus of Section IV. The latter