International Journal of INTELLIGENT SYSTEMS AND APPLICATIONS IN ENGINEERING ISSN:2147-67992147-6799 www.ijisae.org Original Research Paper International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3833 A Review of Various Datasets for Machine Learning Algorithm-Based Intrusion Detection System: Advances and Challenges Sudhanshu Sekhar Tripathy 1 , Dr. Bichitrananda Behera 2 Submitted: 15/03/2024 Accepted : 29/04/2024 Accepted: 06/05/2024 Abstract: IDS aims to protect computer networks from security threats by detecting, notifying, and taking appropriate action to prevent illegal access and protect confidential information. As the globe becomes increasingly dependent on technology and automated processes, ensuring secured systems, applications, and networks has become one of the most significant problems of this era. The global web and digital technology have significantly accelerated the evolution of the modern world, necessitating the use of telecommunications and data transfer platforms. Researchers are enhancing the effectiveness of IDS by incorporating popular datasets into machine learning algorithms. IDS, equipped with machine learning classifiers, enhances security attack detection accuracy by identifying normal or abnormal network traffic. This paper explores the methods of capturing and reviewing intrusion detection systems (IDS) and evaluates the challenges existing datasets face. A deluge of research on machine learning (ML) and deep learning (DL) architecture-based intrusion detection techniques have been conducted in the past ten years on a variety of cyber security-based datasets, including KDDCUP'99, NSL-KDD, UNSW-NB15, CICIDS-2017, and CSE-CIC-IDS2018. We conducted a literature review and presented an in-depth analysis of various intrusion detection methods that use SVM, KNN, DT, LR, NB, RF, XGBOOST, Adaboost, and ANN. We have given an overview of each technique, explaining the function of the classifier mentioned above and all other algorithms used in the research. Additionally, a comprehensive analysis of each method has been provided in tabular form, emphasizing the dataset utilized, classifiers employed, assaults detected, an accurate evaluation matrix, and conclusions drawn from every technique investigated. This article provides a comprehensive overview of recent research on developing a reliable IDS using five distinct datasets for future research. This investigation was carefully analyzed and contrasted with the findings from numerous investigations. Keywords: Intrusion Detection System, ML classifiers, Different IDS datasets, Evaluation matrix with accuracy, Detected assaults 1. Introduction The rapid growth of the information technology field in the last 10 years has made creating reliable computer networks a crucial task for IT managers. However, this task is challenging due to the numerous threats that can compromise the confidentiality, integrity, and availability of these networks, making them vulnerable to various risks. [1]. The Internet is a crucial tool in everyday life, used in commerce, education, medical sector, entertainment, and different fields. As technology advances, it becomes more common to use networks in various aspects of life. However, an attack on the network poses a risk due to its popularity. IDS is a component of computer software that analyses an entire infrastructure or network of things for fraudulent behavior or adhering to restrictions. People now depend drastically on web access for practically all facets of our daily existence, as the web has completely transformed communication and our way of life. As a result, online privacy has emerged as one of the most important and pressing problems of our day. Escalating and further powerful digital attacks, crimes, and hacking resulted from our increasing reliance on digital infrastructure and software applications. Numerous security solutions have been extensively explored and implemented throughout the years to defend against them, including firewalls, intrusion detection systems, cryptography, and encryption and decryption approaches. Due to its capacity to detect, track, and prevent intrusions by exploiting already present concepts and trends, intrusion detection [2] is regarded as the initial stage of protection against complicated and dynamic invasions [3]. Intrusion is the process of getting illegitimate entry to networks or services by tampering with the infrastructure and rendering it vulnerable. Information security is comprised of three core principles that include confidentiality, integrity, and availability. Integrity ensures data remains accurate and unaltered, while availability ensures that data are accessible to authorized users and confidentiality ensures to restrict unauthorized access and sharing personal data. Intrusion detection systems identify intruders but are susceptible to false alarms. Organizations must adjust IDS products post-implementation to prevent false alarms [4]. This review of literature examines various IDS computational algorithms, including Support Vector Machine (SVM), K Nearest Neighbour (KNN), Decision Tree Classifier (DT), Logistic Regression (LR), Naive Bayes Classifier (NB), Random Forest Classifier (RF), Extreme Gradient Boosting Classifier 1 C V Raman Global University, Bhubaneswar–752054, Odisha ORCID ID : 0009-0003-5567-458X 2 C V Raman Global University, Bhubaneswar–752054, Odisha ORCID ID : 0000-0002-9362-7691 *Corresponding Author Email: tripathysudhanshu6@gmail.com