(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 2, 2020 352 | Page www.ijacsa.thesai.org An Attribution of Cyberattack using Association Rule Mining (ARM) Md Sahrom Abu 1 , Aswami Ariffin 4 Malaysian Computer Emergency Response Team Cybersecurity Malaysia, Cyberjaya Selangor DE, Malaysia Siti Rahayu Selamat 2 , Robiah Yusof 3 Faculty of Information Technology and Communication Universiti Teknikal Malaysia Melaka Durian Tunggal, Melaka, Malaysia Abstract—With the rapid development of computer networks and information technology, an attacker has taken advantage to manipulate the situation to launch a complicated cyberattack. This complicated cyberattack causes a lot of problems among the organization because it requires an effective cyberattack attribution to mitigate and reduce the infection rate. Cyber Threat Intelligence (CTI) has gain wide coverage from the media due to its capability to provide CTI feeds from various data sources that can be used for cyberattack attribution. In this paper, we study the relationship of basic Indicator of Compromise (IOC) based on a network traffic dataset from a data mining approach. This dataset is obtained using a crawler that is deployed to pull security feed from Shadowserver. Then an association analysis method using Apriori Algorithm is implemented to extract rules that can discover interesting relationship between large sets of data items. Finally, the extracted rules are evaluated over the factor of interestingness measure of support, confidence and lift to quantify the value of association rules generated with Apriori Algorithm. By implementing the Apriori Algorithm in Shadowserver dataset, we discover some association rules among several IOC which can help attribute the cyberattack. Keywords—CTI; association rule mining; Apriori Algorithm; attribution; interestingness measures I. INTRODUCTION With rapid development of computer networks and information technology such as internet connectivity, cloud storage and social media, various devices can easily connect to the internet. While this improvement has help internet users to access the latest information quickly, it also has bad consequences where an attacker can improve their tactic, technique and procedure (TTP) to launch a more complicated cyberattack. According to the statistic released by Malaysian Computer Emergency Response Team (MyCERT) as shown in Fig. 1, the number of malicious network activity, specifically on botnet in Malaysia had averagely surpassed 1 million unique IP infections per year [1]. This infection rate had caused a growing concern toward internet users in Malaysia because cybercriminals can manipulate the infected device for illegal activities. The infected machines can be used to deploy malware, initiate attacks on websites, steal personal information and mining cryptocurrencies. The number of infections rate is very alarming, and it causes a lot of problems among the organization because it requires an effective cyberattack attribution to mitigate and reduce the infection rate. Besides, this growing concern among internet users in Malaysia, Cyber Threat Intelligence (CTI) has gain wide coverage from the media due to its capability to provide CTI feeds from various data sources that can be used for cyberattack attribution. However, a proper process of voluminous data available in Cyber Threat Intelligence (CTI) is needed to achieve an effective cyberattack attribution. Hence, the objective of this paper is to learn more about the relationship of basic Indicator of Compromise (IOC) using network traffic dataset from data mining approach. The network traffic dataset is obtain from Shadow server feed using a crawler. After that the extraction of rules to discover the interesting relationship between large sets of data items is conducted using an association analysis method. As a result, the implementation of association analysis method using Apriori Algorithm on Shadow server dataset can help to attribute the cyberattack based on useful information behind the association rules among several IOC. The remaining of the paper is organized as follows: Section II presents the research background and related work based on association rules mining in CTI. Section III describes the proposed methodology that includes data collection using CTI feeds, data preprocessing and association analysis using the Apriori algorithm. While Section IV elaborates the rules extraction methods and represents the outcome of using interestingness measures to evaluate the rules generated. Finally, Section V provides a brief conclusion for this paper. Fig. 1. Statistic of Botnet Infection in Malaysia.