I. J. Computer Network and Information Security, 2022, 1, 81-90
Published Online February 2022 in MECS (http://www.mecs-press.org/)
DOI: 10.5815/ijcnis.2022.01.07
Copyright © 2022 MECS I.J. Computer Network and Information Security, 2022, 1, 81-90
Ensem_SLDR: Classification of Cybercrime
using Ensemble Learning Technique
Hemakshi Pandey
1
1
Department of Computer Science Engineering, Bhagwan Parshuram Institute of Technology, New Delhi-110089, India
E-mail: mahe173fas@gmail.com
Riya Goyal
2
, Deepali Virmani
3
and Charu Gupta
4
2, 3, 4
Department of Computer Science Engineering, Bhagwan Parshuram Institute of Technology, New Delhi-110089,
India
E-mail: riya.goyal2599@gmail.com, deepalivirmani@gmail.com, charugupta@bpitindia.com
Received: 08 June 2021; Accepted: 14 October 2021; Published: 08 February 2022
Abstract: With the advancement of technology, cybercrimes are surging at an alarming rate as miscreants pour into the
world's modern reliance on the virtual platform. Due to the accumulation of an enormous quantity of cybercrime data,
there is huge potential to analyze and segregate the data with the help of Machine Learning. The focus of this research
is to construct a model, Ensem_SLDR which can predict the relevant sections of IT Act 2000 from the compliant
text/subjects with the aid of Natural Language Processing, Machine Learning, and Ensemble Learning methods. The
objective of this paper is to implement a robust technique to categorize cybercrime into two sections, 66 and 67 of IT
Act 2000 with high precision using ensemble learning technique. In the proposed methodology, Bag of Words approach
is applied for performing feature engineering where these features are given as input to the hybrid model Ensem_SLDR.
The proposed model is implemented with the help of model stacking, comprising Support Vector Machine (SVM),
Logistic Regression, Decision Tree, and Random Forest and gave better performance by having 96.55 % accuracy,
which is higher and reliable than the past models implemented using a single learning algorithm and some of the
existing hybrid models. Ensemble learning techniques enhance model performance and robustness. This research is
beneficial for cyber-crime cells in India, which have a repository of detailed information on cybercrime including
complaints and investigations. Hence, there is a need for model and automation systems empowered by artificial
intelligence technologies for the analysis of cybercrime and their classification of its sections.
Index Terms: Cybercrime, Bag of Words, Ensemble Learning, Machine Learning, Natural Language Processing.
1. Introduction
With the dynamic technological development, the dependency on cyberspace has increased [28]. Concepts and
terminologies which seldom existed years ago have now been infused into our day-to-day life, as cyber-crime,
computer-related crime, information crime, or internet crime. The crime which occurs with the aid of a computer, the
internet, or any device is known as cyber-crime. Today, people all over the world are connected through social media
networks which are vulnerable to cyber terrorism.
Due to the accumulation of a colossal amount of cybercrime data which may include complaint text or
investigation description, there is huge potential to analyze the data with the help of Artificial Intelligence (AI),
Machine Learning (ML), Ensemble Learning [29], and Natural Language Processing (NLP) [24, 25, 26, 27, 30]. With
the culmination of extensive research in the field of NLP, there are multifarious applications in law enforcement like
text summarization, relationship extraction, prediction of crime, criminal intelligence gathering, etc. [1].
In India, cybercrime is undertaken by cybercrime cells and the criminal acts committed may involve cyber
terrorism, hacking, online stalking, online fraud, identity theft, or sending of offensive messages or circulation of
obscene or toxic material. Under IT Act 2000, there are 94 sections originally however, two sections deal with the
majority of these punishable offenses: section 66 involves computer-related offenses while section 67 involves
punishment for transmitting obscene or toxic content [2]. The objective of this research is to implement a robust
technique to categorize cybercrime into two sections, 66 and 67 of IT Act 2000 with high precision using ensemble
learning techniques on the collected and processed data. To develop such classification frameworks, features may be
extracted which is crucial in the identification of characteristics as defined in sections of various punishable offenses.
For instance, the description containing the words like ‘fraud’, ‘terrorism’ will be classified under section 66 while the
words like ‘child’ or ‘obscene’ will be classified under section 67.