IoT Network Intrusion Detection with Ensemble Learners Sulyman Age Abdulkareem * , Chuan Heng Foh * , Haeyoung Lee * , Franc ¸ois Carrez * , Klaus Moessner *† * 5GIC & 6GIC, Institute for Communication Systems (ICS), University of Surrey, Guildford, Surrey, U.K. † Faculty of Electronics and Information Technology, Technical University Chemnitz, Germany. Email:{s.abdulkareem, c.foh, Haeyoung.Lee f.carrez, k.moessner}@surrey.ac.uk, klaus.moessner@etit.tu-chemnitz.de Abstract—Protecting information systems against intruders’ attacks requires utilising intrusion detection systems. Over the past several years, many open-source intrusion datasets have been made available so that academics and researchers can analyse and assess various detection classifiers’ effectiveness. These datasets are made available with a full complement of illustrative network features. In this research, we investigate the issue of Network Intrusion Detection (NID) by utilising an Internet of Things (IoT) dataset called Bot-IoT to evaluate the detection efficiency and effectiveness of five different Ensemble Learning Classifiers (ELCs). Our experiment’s results showed that despite all ELCs recording high classification metric scores, CatBoost emerged as the ELC that performed the best in our experiment in terms of Accuracy, Precision, F1-Score, Training and Test Time. Index Terms—Network Intrusion Detection, Machine Learn- ing, Ensemble Learning Classifiers, CatBoost, IoT. I. I NTRODUCTION Our everyday lives are becoming increasingly intertwined with vast amounts of data thanks to the fast expansion of information technology. Cisco has projected that IP traffic is expected to expand from 120 exabytes per month in 2017 to 400 exabytes per month in 2022 [1]. Increased network traffic has led to a growth in the amount of cyberattack- related risks, which have become more diverse. The word “cyberattack” is often used to describe an uninvited attempt to threaten, disable, damage, steal, or otherwise compromise another party’s information assets. Many businesses now rely on network intrusion detection systems (NIDS) to keep their networks safe. Security measures such as firewalls, virus pro- tection, data encryption, and user authentication are essential but not sufficient to protect computers and networks from today’s threats. In the face of these issues, intrusion detection systems (IDS), a Machine Learning (ML) based method, and the aforementioned security measures can work together [2]. An IDS monitors and analyses network traffic in real-time to detect latent data anomalies. IDSs may be divided into two categories based on their detection philosophy [3]. The first one is Signature-based intrusion detection which uses predefined attack signatures to characterise intrusion attempts on the network. As such, this approach cannot detect new attacks [4]. On the other hand, anomaly-based detection may uncover previously undisclosed attacks by analysing network data for anomalies using machine learning algorithms. An anomaly is an incident or behaviour that is out of the ordinary. Many studies [5]–[8] have focused on enhancing the accuracy and efficiency of IDSs. Anomaly-based IDS has been widely implemented and is now the primary focus of IDS research due to its promising efficacy. In recent years, machine learning algorithms such as De- cision Tree, Random Forest, Support Vector Machine (SVM), and Neural Networks have been applied to intrusion detection. However, each algorithm has its advantages and downsides. Classifiers that work well to detect a particular attack may not work well for another. According to several previous re- search publications [9]–[11], there are still certain drawbacks, no matter the pre-processing or feature selection methods used alongside the classifiers. The ML architecture for IDS continuously grows into increasingly complex classifiers to increase its efficacy. Ensemble Learning Classifiers (ELCs) for classification are an example of improving the coherence and competence of detecting intrusions. The approach has become more popular than the use of single classifiers. To compensate for the deficiencies of the weak classifiers, it combines them to build a powerful learner. The ELCs are better options over single classifiers since they produce better performance. This paper evaluates the performance of five Ensemble Learning Classifiers on the Bot-IoT dataset [12], an IoT dataset to test for multiclass classification performance. CatBoost, Random Forest, LightGBM, and XGBoost are the ELCs employed in this investigation. The two primary focuses of this study are detection effectiveness and speed. Smart homes, smart cities, and smart transportation systems are among the many applications for the Internet of Things (IoT) we looked at in our research. However, IoT security protection is still lacking compared to traditional network applications. We test the detection abilities of the five ELCs on the IoT dataset’s network categories [12]. Our experiment findings indicate that CatBoost is able to give the best classification result compared to other ELCs. CatBoost also trained and tested the dataset in the shortest time, making it a superior ELC over the other ELCs in our study. The remainder of this paper is organised as follows. Section II overviews some of the most recent NID ensemble ML clas- sifier experiments. Section III introduces our experiment IoT dataset, Ensemble Learning Classifiers, as well as our research strategy and experiment design. Comparing the performance outcomes among ELCs is the focus of Section IV. The latter