2022 IEEE NIGERCON 978-1-6654-7978-3/22/$31.00 ©2022 IEEE Combating Network Intrusions using Machine Learning Techniques with Multilevel Feature Selection Method Tosin Comfort OLAYINKA Department of Computer Science Wellspring University, Benin City, Edo Sate Nigeria tcolayinka@gmail.com Adebayọ Olusọla ADETUNMBI Department of Computer Science Federal University of Technology Akure, Nigeria aoadetunmbi@futa.edu.ng Chukwuemeka Christian UGWU Department of Computer Science Federal University of Technology Akure, Nigeria ugwucc@futa.edu.ng Olugbemiga Solomon POPOỌLA Department of Computer Science Ọsun State College of Education Ila-Ọrangun, Nigeria popsol7@yahoo.com Omoibu Joseph OKHUOYA ICT/CRPU International Training Center University of Benin Benin City, Nigeria joseph.okhuoya@uniben.edu.ng Abstract— The heavy dependency on the internet, as well as other emerging technologies for access, storage, and sharing of information, has triggered a proportional increase in cyberattacks, thereby making network intrusion detection system (NIDS) a crucial component in security systems. NIDS is employed to monitor abnormal activities on a network. However, issues of low accuracy and high false positive remain prevalent among NIDSs. In an attempt to improve the performance in the prediction of network intrusions, this paper applied in parallel, four (4) machine learning models: k-Nearest Neighbor (k-NN), Naïve Bayes (NB), Logistic Regression (LR), and Artificial Neural Network (ANN) with multilevel feature selection method to determine which of the models has the best detection capability in terms of Accuracy, Positive Predicted Values (PPV), Recall, F1-score, and Receiver Operating Characteristics (ROC) Curve. The models were validated on NSL-KDD intrusion data and the result shows k-NN had the best performance with an accuracy of 79.1%, recall of 66.5%, positive predicted values of 96.7%, and F1-measure of 78.1%. Keywords— Intrusion Detection, Classification, Machine Learning, Network Traffic, Feature Selection, Anomaly. I. INTRODUCTION The prevalent use of computers and networks in recent years and the development of new technologies such as Internet of things (IoT), cloud computing, and big data among others, have presented serious security concerns for both corporate and social networks [1]. New attack types with complex attack strategies are emerging on daily basis, and their effect on networks poses a challenge to information technology (IT) security experts equipped with traditional defense mechanisms. Nowadays, network securities are strengthened using multiple defensive tools; and in most of these cases, intrusion detection systems (IDSs) are used as complementary defense tools. IDS are either software or device tools that monitor activities in a network or device system for unusual or malicious occurrences. Network intrusion detection system (NIDS) collects information from several key nodes in the computer network system, checks whether there are any violations of security policies and signs of an attack in the network, identifies threats in the network, and generates alarms, to provide real-time protection for internal attacks, external attacks, and mis-operations [2]. The two main detection methods for NIDS are misuse and anomaly method. The misuse method is best used for detecting known attacks but suffers performance degradation when attacks are unknown. Anomaly-based methods on other hand are suitable for unknown attack detection but highly susceptible to false positives [3], [4]. Several works have been done to improve NIDS performance. Some of which include hybridization of signature and anomaly-based approach [5], the inclusion of false alarm filter to both signature and anomaly NIDS, and the latest is the application of machine learning (ML) algorithms. The need for ML arose due to several challenges faced by existing NIDSs such as processing large volumes of data, low accuracy detection, and high false positive rate. ML learns useful patterns from existing data as a reference for normal/attack traffic behaviour profiles for subsequent classification of network traffic [4]. ML methods proposed for different intrusion detection problems are broadly classified into supervised and unsupervised detection. The unsupervised IDS learn patterns of possible network intrusions from unlabeled training data [6], while supervised models detect possible intrusions by training on already labelled intrusion datasets. Creating supervised training samples for ML-IDS might be a little challenging, but the result is highly accurate and reliable, which makes it popular among intrusion detection experts. This paper centers on supervised ML-NIDS as we aim to analyze k-NN, Naïve Bayes, ANN, and Logistic Regression on one of the commonly used network intrusion datasets (NSL-KDD) based on standard performance metrics such as accuracy, recall, precision, among others. The organization of the remaining sections of this paper is as follows: Section II shows the appraisal of related works. The system architecture, methods, and materials are discussed in Section 2022 IEEE Nigeria 4th International Conference on Disruptive Technologies for Sustainable Development (NIGERCON) | 978-1-6654-7978-3/22/$31.00 ©2022 IEEE | DOI: 10.1109/NIGERCON54645.2022.9803098 Authorized licensed use limited to: UNIVERSITY OF COLORADO. Downloaded on September 17,2022 at 06:43:21 UTC from IEEE Xplore. Restrictions apply.