Improving the Detection Rate of Rarely Appearing Intrusions in Network-Based Intrusion Detection Systems Eunmok Yang 1 , Gyanendra Prasad Joshi 2 and Changho Seo 3,* 1 Department of Financial Information Security, Kookmin University, Seoul, 02707, Korea 2 Department of Computer Science and Engineering, Sejong University, Seoul, 05006, Korea 3 Department of Convergence Science, Kongju National University, Gongju, 32588, Korea Corresponding Author: Changho Seo. Email: chseo@kongju.ac.kr Received: 29 July 2020; Accepted: 11 September 2020 Abstract: In network-based intrusion detection practices, there are more regular instances than intrusion instances. Because there is always a statistical imbalance in the instances, it is difcult to train the intrusion detection system effectively. In this work, we compare intrusion detection performance by increasing the rarely appearing instances rather than by eliminating the frequently appearing duplicate instances. Our technique mitigates the statistical imbalance in these instances. We also carried out an experiment on the training model by increasing the instances, thereby increasing the attack instances step by step up to 13 levels. The experi- ments included not only known attacks, but also unknown new intrusions. The results are compared with the existing studies from the literature, and show an improvement in accuracy, sensitivity, and specicity over previous studies. The detection rates for the remote-to-user (R2L) and user-to-root (U2L) categories are improved signicantly by adding fewer instances. The detection of many intrusions is increased from a very low to a very high detection rate. The detection of newer attacks that had not been used in training improved from 9% to 12%. This study has practical applications in network administration to protect from known and unknown attacks. If network administrators are running out of instances for some attacks, they can increase the number of instances with rarely appearing instances, thereby improving the detection of both known and unknown new attacks. Keywords: Intrusion detection; statistical imbalance; SMO; machine learning; network security 1 Introduction Network security is becoming a matter of global interest and importance, as evidenced by the fact that network intruders are now regularly making the headlines. As more and more different devices are connected to the network, the network administrator needs a way to determine that the data passing through the network is not an intrusion. Intrusion detection systems (IDSs) can be classied into host-based and network-based detection systems. A host-based IDS basically monitors and analyzes intrusions within a machine. A This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Computers, Materials & Continua DOI:10.32604/cmc.2020.013210 Article ech T Press Science