I. J. Computer Network and Information Security, 2022, 1, 81-90 Published Online February 2022 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijcnis.2022.01.07 Copyright © 2022 MECS I.J. Computer Network and Information Security, 2022, 1, 81-90 Ensem_SLDR: Classification of Cybercrime using Ensemble Learning Technique Hemakshi Pandey 1 1 Department of Computer Science Engineering, Bhagwan Parshuram Institute of Technology, New Delhi-110089, India E-mail: mahe173fas@gmail.com Riya Goyal 2 , Deepali Virmani 3 and Charu Gupta 4 2, 3, 4 Department of Computer Science Engineering, Bhagwan Parshuram Institute of Technology, New Delhi-110089, India E-mail: riya.goyal2599@gmail.com, deepalivirmani@gmail.com, charugupta@bpitindia.com Received: 08 June 2021; Accepted: 14 October 2021; Published: 08 February 2022 Abstract: With the advancement of technology, cybercrimes are surging at an alarming rate as miscreants pour into the world's modern reliance on the virtual platform. Due to the accumulation of an enormous quantity of cybercrime data, there is huge potential to analyze and segregate the data with the help of Machine Learning. The focus of this research is to construct a model, Ensem_SLDR which can predict the relevant sections of IT Act 2000 from the compliant text/subjects with the aid of Natural Language Processing, Machine Learning, and Ensemble Learning methods. The objective of this paper is to implement a robust technique to categorize cybercrime into two sections, 66 and 67 of IT Act 2000 with high precision using ensemble learning technique. In the proposed methodology, Bag of Words approach is applied for performing feature engineering where these features are given as input to the hybrid model Ensem_SLDR. The proposed model is implemented with the help of model stacking, comprising Support Vector Machine (SVM), Logistic Regression, Decision Tree, and Random Forest and gave better performance by having 96.55 % accuracy, which is higher and reliable than the past models implemented using a single learning algorithm and some of the existing hybrid models. Ensemble learning techniques enhance model performance and robustness. This research is beneficial for cyber-crime cells in India, which have a repository of detailed information on cybercrime including complaints and investigations. Hence, there is a need for model and automation systems empowered by artificial intelligence technologies for the analysis of cybercrime and their classification of its sections. Index Terms: Cybercrime, Bag of Words, Ensemble Learning, Machine Learning, Natural Language Processing. 1. Introduction With the dynamic technological development, the dependency on cyberspace has increased [28]. Concepts and terminologies which seldom existed years ago have now been infused into our day-to-day life, as cyber-crime, computer-related crime, information crime, or internet crime. The crime which occurs with the aid of a computer, the internet, or any device is known as cyber-crime. Today, people all over the world are connected through social media networks which are vulnerable to cyber terrorism. Due to the accumulation of a colossal amount of cybercrime data which may include complaint text or investigation description, there is huge potential to analyze the data with the help of Artificial Intelligence (AI), Machine Learning (ML), Ensemble Learning [29], and Natural Language Processing (NLP) [24, 25, 26, 27, 30]. With the culmination of extensive research in the field of NLP, there are multifarious applications in law enforcement like text summarization, relationship extraction, prediction of crime, criminal intelligence gathering, etc. [1]. In India, cybercrime is undertaken by cybercrime cells and the criminal acts committed may involve cyber terrorism, hacking, online stalking, online fraud, identity theft, or sending of offensive messages or circulation of obscene or toxic material. Under IT Act 2000, there are 94 sections originally however, two sections deal with the majority of these punishable offenses: section 66 involves computer-related offenses while section 67 involves punishment for transmitting obscene or toxic content [2]. The objective of this research is to implement a robust technique to categorize cybercrime into two sections, 66 and 67 of IT Act 2000 with high precision using ensemble learning techniques on the collected and processed data. To develop such classification frameworks, features may be extracted which is crucial in the identification of characteristics as defined in sections of various punishable offenses. For instance, the description containing the words like ‘fraud’, ‘terrorism’ will be classified under section 66 while the words like ‘child’ or ‘obscene’ will be classified under section 67.