International Journal of Electrical and Computer Engineering (IJECE) Vol. 15, No. 3, June 2025, pp. 3139~3148 ISSN: 2088-8708, DOI: 10.11591/ijece.v15i3.pp3139-3148  3139 Journal homepage: http://ijece.iaescore.com Enhancing cyberbullying detection with advanced text preprocessing and machine learning Rakesh Bapu Dhumale 1 , Ajay Kumar Dass 2 , Amit Umbrajkaar 3 , Pradeep Mane 1 1 Department of Electronics and Telecommunication Engineering, AISSMS Institute of Information Technology, Pune, India 2 Department of Electronics and Telecommunication Engineering, Sinhgad College of Engineering, Pune, India 3 School of Mechanical Engineering, D. Y. Patil International University, Pune, India Article Info ABSTRACT Article history: Received Aug 24, 2024 Revised Feb 7, 2025 Accepted Mar 4, 2025 The use of social media and the internet has been increasing dramatically in recent years. Cyber-bullying is the term used to describe the misuse of social media by some people who make threatening comments. This has a devastating influence on people's lives, especially those of children and teenagers, and can lead to feelings of depression and suicidal thoughts. The methodology proposed in this paper includes four steps for identifying cyberbullying: preprocessing, feature extraction, classification, and evaluation. The first step is to create a labeled, varied dataset. Word2Vec and term frequency-inverse document frequency are used in feature extraction to transform text into high-dimensional vectors. Word2Vec creates word embeddings using the skip-gram and continuous bag-of-words models, while term frequency-inverse document frequency assesses the text's term relevancy. Support vector machine classifiers are used in the model, and their effectiveness is compared to that of other techniques like logistic regression and naïve Bayes. The classifiers support vector machine, naïve Bayes, and logistic regression were assessed. The maximum accuracy was 95% for the support vector classifier with skip-gram and 93% for continuous bag-of-words. For sentiment categories, F1-scores, recall, and precision were computed. The average precision and recall were 0.77 and 0.79, respectively. Keywords: Cyberbullying Detection Online threats Social media Social media misuse Support vector machines Text classification This is an open access article under the CC BY-SA license. Corresponding Author: Rakesh Bapu Dhumale Department of Electronics and Telecommunication Engineering, AISSMS Institute of Information Technology Pune, India Email: rbd.scoe@gmail.com 1. INTRODUCTION Social media addiction has increased globally in direct proportion to the phenomenal growth in data service availability. Similar to other nations, cyberbullying has sharply increased [1]. With everyone living on digital and online platforms in this Web 4.0 era, it is extremely challenging to safeguard society against the startling increase in cybercrime [2]. It has been discovered that teenagers are the main victims of cyberbullying [3]. Cyberbullying is a dangerous and damaging behavior that can make victims try suicide and suffer lasting consequences [4]. Cyberbullying detection can be seen as a classification problem in which posts made online are classified as either bullying or non-bullying ones [5]. A system that enhances the performance of detecting cyberbullying will be created via a variety of computer vision (CV), machine learning (ML) and natural language processing (NLP) techniques [6].