Indonesian Journal of Electrical Engineering and Computer Science Vol. 15, No. 2, August 2019, pp. 1046~1053 ISSN: 2502-4752, DOI: 10.11591/ijeecs.v15.i2.pp1046-1053 1046 Journal homepage: http://iaescore.com/journals/index.php/ijeecs Video spam comment features selection using machine learning techniques Nabilah Alias, Cik Feresa Mohd Foozy, Sofia Najwa Ramli Applied Computing Technology (ACT), Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Malaysia Article Info ABSTRACT Article history: Received Sep 12, 2018 Revised Jan 20, 2019 Accepted Mar 2, 2019 Nowadays, social media (e.g., YouTube and Facebook) provides connection and interaction between people by posting comments or videos. In fact, comments are a part of contents in a website that can attract spammer to spreading phishing, malware or advertising. Due to existing malicious users that can spread malware or phishing in the comments, this work proposes a technique used for video sharing spam comments feature detection. The first phase of the methodology used in this work is dataset collection. For this experiment, a dataset from UCI Machine Learning repository is used. In the next phase, the development of framework and experimentation. The dataset will be pre-processed using tokenization and lemmatization process. After that, the features to detect spam is selected and the experiments for classification were performed by using six classifiers which are Random Tree, Random Forest, Naïve Bayes, KStar, Decision Table, and Decision Stump. The result shows the highest accuracy is 90.57% and the lowest was 58.86%. Keywords: Video spam comment Machine learning Feature selection Copyright © 2019 Institute of Advanced Engineering and Science. All rights reserved. Corresponding Author: Naqliyah Zainuddin, CyberSecurity Malaysia, Level 4 Block C, Bangunan MINES Waterfront Business Park, No. 3 Jalan Tasek, 43300 Seri Kembangan, Selangor, Malaysia. Email: naqliyah@cybersecurity.my 1. INTRODUCTION At present, worldwide broadband distribution has increased the number of Internet users. With faster connections, hosting and video sharing services are becoming popular among users [1]. The availability of resources over the Internet and broadband connection enables the emergence of sophisticated new platforms. In this way, YouTube is a one well-known video content publishing platform with social networking features, such as support for posting text comments to provide interactions between producers (channel owners) and viewers [2]. Recently, YouTube has used monetization systems to reward producers, stimulating them to produce high quality original content and increase the amount of visualization. After the use of this system, the platform is flooded with unwanted content, typically low quality information known as spam. Spam is the use of an electronic messaging system to send unsolicited messages, especially advertisements, as well as repeat messages on the same website. For social spam, it can be done in many ways, including mass messaging, cruelty, humiliation, hate speech, malicious links, fake reviews, fake hints, and personal information [3]. Indeed, it is a problem that could become critical. It caused the user disable comments on their videos because the most of comments are spam. Until now, the research to detect the spam YouTube comment using machine learning technique is still lacking.