IJNMT (International Journal of New Media Technology), Vol. 8, No. 1 | June 2021 57 ISSN 2355-0082 Cyberbullying Sentiment Analysis with Word2Vec and One-Against-All Support Vector Machine Lionel Reinhart Halim 1 , Alethea Suryadibrata 2 1,2 Department of Informatics, Universitas Multimedia Nusantara, Tangerang, Indonesia 1 lionel.halim@student.umn.ac.id, 2 alethea@umn.ac.id Accepted on May 19, 2021 Approved on June 09, 2021 Abstract—Depression and social anxiety are the two main negative impacts of cyberbullying. Unfortunately, a survey conducted by UNICEF on 3 rd September 2019 showed that 1 in 3 young people in 30 countries had been victims of cyberbullying. Sentiment analysis research will be conducted to detect a comment that contains cyberbullying. Dataset of cyberbullying is obtained from the Kaggle website, named, Toxic Comment Classification Challenge. The pre-processing process consists of 4 stages, namely comment generalization (convert text into lowercase and remove punctuation), tokenization, stop words removal, and lemmatization. Word Embedding will be used to conduct sentiment analysis by implementing Word2Vec. After that, One-Against-All (OAA) method with the Support Vector Machine (SVM) model will be used to make predictions in the form of multi labelling. The SVM model will go through a hyperparameter tuning process using Randomized Search CV. Then, evaluation will be carried out using Micro Averaged F1 Score to assess the prediction accuracy and Hamming Loss to assess the numbers of pairs of sample and label that are incorrectly classified. Implementation result of Word2Vec and OAA SVM model provide the best result for the data undergoing the process of pre-processing using comment generalization, tokenization, stop words removal, and lemmatization which is stored into 100 features in Word2Vec model. Micro Averaged F1 and Hamming Loss percentage that is produced by the tuned model is 83,40% and 15,13% respectively. Index Terms—One-Against-All; multi labelling; sentiment analysis; Toxic Comment Classification Challenge; Word2Vec; word embedding I. INTRODUCTION Cyberbullying refers to bullying that uses electronic technology such as smartphones and the internet. A victim of cyberbullying may increase the risk of low self-esteem [1]. Low self-esteem can cause anxiety and depression [2]. These impacts are supported by the statistics provided by Broadband Search regarding mental health that comes from cyberbullying that depression and social anxiety are in the top 2 ranks [3]. Unfortunately, 1 out of 3 young people in 30 countries has been a victim of cyberbullying [4]. To prevent cyberbullying from happening, detection will be needed. This detection can be achieved by NLP technique which focuses on the interactions between computers and human (natural) languages to do text processing [5]. One of them is sentiment analysis with its ultimate task is to do emotion identification [6]. Sentiment analysis will be used by implementing the Word Embedding approach. This approach will represent words into a vector space and will be achieved by using Word2Vec with Continuous Bag-of-Words (CBoW) model architecture. This model will take words as input and generate vectors as outputs. By using Word2Vec, semantic relationships between words in a sentence can also be found [7]. Thus, Word2Vec has a great role in performing sentiment analysis. Detection of cyberbullying will be done by using sentiment analysis from Word2Vec and implementing Multi-label Classification. There will be six classes that will be used, namely toxic, severe toxic, obscene, threat, insult, and identity hate. Support Vector Machine (SVM) model will be used to do classification as it is performed better in text processing [8]. Then, One-Against-All (OAA) strategy will be used to be able to implement Multi-label Classification on the SVM. II. LITERATURE REVIEW A. Pre-processing Pre-processing is an important step to transform text into a better form with the intention of preparing text for the next step. Pre-processing steps includes [9]:  Converting all letters to lower case  Removing stop words  Removing punctuations  Converting text into its root forms (lemmatization)