Indonesian Journal of Electrical Engineering and Computer Science
Vol. 15, No. 2, August 2019, pp. 1046~1053
ISSN: 2502-4752, DOI: 10.11591/ijeecs.v15.i2.pp1046-1053 1046
Journal homepage: http://iaescore.com/journals/index.php/ijeecs
Video spam comment features selection using machine
learning techniques
Nabilah Alias, Cik Feresa Mohd Foozy, Sofia Najwa Ramli
Applied Computing Technology (ACT), Faculty of Computer Science and Information Technology,
Universiti Tun Hussein Onn Malaysia, Malaysia
Article Info ABSTRACT
Article history:
Received Sep 12, 2018
Revised Jan 20, 2019
Accepted Mar 2, 2019
Nowadays, social media (e.g., YouTube and Facebook) provides connection
and interaction between people by posting comments or videos. In fact,
comments are a part of contents in a website that can attract spammer to
spreading phishing, malware or advertising. Due to existing malicious users
that can spread malware or phishing in the comments, this work proposes a
technique used for video sharing spam comments feature detection. The first
phase of the methodology used in this work is dataset collection. For this
experiment, a dataset from UCI Machine Learning repository is used. In the
next phase, the development of framework and experimentation. The dataset
will be pre-processed using tokenization and lemmatization process. After
that, the features to detect spam is selected and the experiments for
classification were performed by using six classifiers which are Random
Tree, Random Forest, Naïve Bayes, KStar, Decision Table, and Decision
Stump. The result shows the highest accuracy is 90.57% and the lowest
was 58.86%.
Keywords:
Video spam comment
Machine learning
Feature selection
Copyright © 2019 Institute of Advanced Engineering and Science.
All rights reserved.
Corresponding Author:
Naqliyah Zainuddin,
CyberSecurity Malaysia, Level 4 Block C,
Bangunan MINES Waterfront Business Park, No. 3 Jalan Tasek,
43300 Seri Kembangan, Selangor, Malaysia.
Email: naqliyah@cybersecurity.my
1. INTRODUCTION
At present, worldwide broadband distribution has increased the number of Internet users. With
faster connections, hosting and video sharing services are becoming popular among users [1]. The
availability of resources over the Internet and broadband connection enables the emergence of sophisticated
new platforms. In this way, YouTube is a one well-known video content publishing platform with social
networking features, such as support for posting text comments to provide interactions between producers
(channel owners) and viewers [2].
Recently, YouTube has used monetization systems to reward producers, stimulating them to
produce high quality original content and increase the amount of visualization. After the use of this system,
the platform is flooded with unwanted content, typically low quality information known as spam. Spam is the
use of an electronic messaging system to send unsolicited messages, especially advertisements, as well as
repeat messages on the same website. For social spam, it can be done in many ways, including mass
messaging, cruelty, humiliation, hate speech, malicious links, fake reviews, fake hints, and
personal information [3].
Indeed, it is a problem that could become critical. It caused the user disable comments on their
videos because the most of comments are spam. Until now, the research to detect the spam YouTube
comment using machine learning technique is still lacking.