Journal on Advanced Research in Electrical Engineering, Vol. 4, No. 2, Oct. 2020 123 Clustering Data National Examinations Based on Social Media Using K-Means Method Chandra Eko Wahyudi Utomo Department of Electrical Engineering Institut Teknologi Sepuluh Nopember Surabaya, Indonesia Universitas Jember, Jember, Indonesia chandra15@mhs.ee.its.ac.id Mochamad Hariadi Department of Computer Engineering Department of Electrical Engineering Institut Teknologi Sepuluh Nopember Surabaya, Indonesia mochar@ee.its.ac.id Surya Sumpeno Department of Computer Engineering Department of Electrical Engineering Institut Teknologi Sepuluh Nopember Surabaya, Indonesia surya@ee.its.ac.id AbstractThe development of social media as a source of data is now increasingly interesting to study. The social media studied in this research is Twitter. Twitter as one of the top- ranked social media among social media accessed by the people of Indonesia. People's behavior can be learned by collecting and processing data, one of which is people's sentiments or opinions about national examinations in Indonesia. Twitter user behavior in the form of their comments about the national exam in Indonesia. This study aims to analyze the public sentiments of social media users about the National Examination in Indonesia. Data is retrieved by crawling data via the Twitter API. The data needs to be preprocessed first and feature extracted using TF- IDF. However, because the text data on Twitter is unstructured and very diverse data (variety), the grouping stage must be done first. Grouping technique using K-Means Clustering on Spark. Spark clustering techniques are used to overcome the grouping of data on very large and complex amounts of data. From the clustering process using Spark it was found that the grouping process resulted in 3 clusters where elbow detection was found in the third cluster of the number of clusters between 2 and 50. The results of clustering in the form of 3 large groups were further processed (with classification techniques) to get a positive or negative sentiment comparison of social media user comments about the national exam. Furthermore, these results become recommendations and new knowledge about community behavior regarding Social Media-based National Exams. Keywordsclustering, sentiment analysis, national exam, social media, K-Means I. INTRODUCTION In computer science, one of the fields of study of emotion related to social media is sentiment analysis. Sentiment analysis is a way to obtain public sentiment based on a data processor or machine learning so that later it is useful to assess whether a product is accepted by the community or not [1]. In this case, the sentiment analysis has an impact such as being able to influence the behavior of the people in certain studies, which is to raised public sentiment towards Jokowi's candidacy as a candidate for the President of Indonesia in 2014 [2]. Sentiment on social media is so important because it provides insight into people, supports customer service (in this case the customer service of government agencies). In addition, it can also inform the message of the institution / company. When using sentiment analysis tools such as Hootsuite Insights, public relations (PR) can see when conversations around their brands change negatively. This tool will recognize unusual spikes in conversation volume - and measure tones. The research that the author did in the topic of sentiment analysis was the public sentiment of Twitter users towards the National Examination in Indonesia. Twitter media was chosen because it is one of the most popular social media and is considered to represent the upper middle class Indonesian community along with the rapid growth of information technology. Data on social media can be used for research material. One of the many social media users is Twitter. Twitter, with more than 313 million monthly active users and more than 500 million tweets per day [4], is a gold mine for organizations and individuals who have strong social, political or economic interests in maintaining and increasing their influence and reputation. Twitter is a micro- blogging social network that is a very fast emerging platform for users to express their views on politics, sports products etc. This view is useful for businesses, governments and individuals. For this reason, tweets can be a valuable source for mining public opinion [1]. Tweets usually consist of incompleted, noisy and unstructured sentences, irregular expressions, incorrect words and non- dictionary terms. Before feature selection, pre-processing sequences (eg, deleting stop words, removing URLs, replacing negations) were applied to reduce the amount of noise in tweets. [5] The implementation of the National Examinations in Indonesia in the last few years has experienced pros and cons, both for the organizers of the State and the community. The Minister of Education and Culture of the Republic of Indonesia 2014 - 2019, Prof. Muhajir Efendi briefly threw down the idea of a National Examinations moratorium. This is due to the negative impact of the National Examinations which reduces the nature of education and causes many education actors to be tempted to act dishonestly (perhaps they can include real conditions or sources of information). This is reinforced by the decision of the Constitutional Court which states that the National Examinations cannot be used as a benchmark for student graduation but it is the school that determines it. However, the Vice President of the Republic of Indonesia 2014 -2019 HM Yusuf Kalla expressed his rejection because the National Examinations was very important to be implemented to control the quality of national education and standardization in measuring student achievement. Even in developed countries such as Britain, Japan, China and Singapore, the National Examinations is still the final evaluation material for the development of education in a country. Finally, in the Cabinet Session on 7 December 2016 it was decided that the 2017 National Examination would still be held [6]. It's just that, the National Examination is still a conversation in the community with all its advantages and disadvantages.