Predicting Hacker Adoption on Darkweb Forums Using Sequential Rule Mining Ericsson Marin 1 , Mohammed Almukaynizi 1 , Eric Nunes 1 , Jana Shakarian 2 , Paulo Shakarian 1,2 1 Arizona State University 2 Cyber Reconnaissance, Inc. esmarin@asu.edu, malmukay@asu.edu, enunes1@asu.edu, jana@cyr3con.ai, shak@asu.edu Abstract—In recent years, there is a notable rise for proactive, intelligence-driven cyber defense mechanisms. Following this demand, we study here how to leverage the spread of adoption behavior among individuals to predict their posts on hacking forums of the darkweb, driven by the influential activities of their peers. We formulate our problem as a sequential rule mining task, where the goal is to discover user posting rules through sequences of user posts, to later use those rules to make predictions in a near future. We run our experiments using multiple post time granularities and time-windows for obtaining rules, observing precision results up to 0.78 and precision gains up to 837%, when compared to the prior probabilities of hackers posts. Our approach is an additional step in the fight against cyber-attacks. Index Terms—social influence, user adoption, hacking forums, darkweb, rule-learning. I. I NTRODUCTION Significant rise in cyber-attacks in the last decade has elevated the importance of cyber threat intelligence for or- ganizations. As no major operating system or platform seem to be immune, these organizations are embracing intelligence beyond situational awareness. They are trying to move from a reactive to a more proactive intelligence-driven security [1]. One study that can aid in the production of threat intelli- gence is the spread of adoption behavior among hackers on the darkweb. Due to the influence effects [2], values, ideas and techniques are transmitted from one person to another (see examples in economy [3], psychology [4], sociology [5], [6], business [7], public health [8], politics [9]), being this behavior also valid for malicious hackers [10]–[13]. Thus, cyber-security can benefit from applications studying adoption behavior, mining information to anticipate cyber-attacks [14]. Consider for instance, the prediction of online hacktivist campaigns [15]. These activities would be traceable if the users joining the campaign could be predicted, taking into account the existing peer-influence. Another application of this study is the prediction of which hackers will buy a specific hacking product/service that has been offered on a darkweb forum [16]. As standard hackers, who are often influenced by reputable ones, rely on the darkweb hacking forums to improve their skills and capabilities [17], the anticipation of this interaction can be accomplished. Finally, adoption behavior can also be used for addressing online cascade predictions [18], whose primary goal is to detect an early-stage post that is potential to “go viral”, generating multiple subsequent adoptions that increase the chance of cyber-attacks [14]. In this work, we study adoption behavior trying to predict in which topic of a darkweb hacking forum users will post in a near future, given the influence of their peers. We formulate our problem as a sequential rule mining task [19], where the goal is to discover user posting rules through sequences of user posts. Then, we use the mined rules to make predictions of users posting in a particular forum topic. In general, sequential rule mining is an important data mining technique that tries to predict event(s) that are likely to follow other events(s) with a given probability, using patterns mined from sequences [20]. Adapting the problem to our context, we make each rule of the form X Y only contain the users responsible for the posts, being interpreted as “if X (a set of users) engages in a given forum topic, Y (a unique user) is likely to engage in the same topic (or adopt it) with a given confidence afterward, mainly because of the influence of X”. Additionally, as previous research demonstrated that influence decreases over time because of effects of time constraints such as Forgettable Span [21], we only consider rules occurring within defined time-windows. We also verify how the precision of our model changes according to two post time granularities (day and hour). Finally, we compare our results with those produced by the prior probabilities of hackers posts, showing how our predictions and prediction gains are considerably higher. This work makes the following main contributions: 1) We collect more than 330,000 hacker posts of a popular darkweb forum to create a sequential rule mining model capable of predicting future posts of hackers; 2) We consider 2 post time granularities (day and hour) and 10 different time-windows for each time granularity; 3) During training, we mine more than 362,617 sequential rules with different sizes (number of users); 4) During testing, we obtain prediction precision results of up to 0.78, while a baseline model reaches up to 0.18; 5) We observe the highest precision gain [785%, 837%] for time-windows in [3,5] days, confirming the Forgettable Span [21] effects also on hacking forums of the darkweb. This paper is organized as follows: Section II defines the darkweb and our dataset. Section III formalizes our sequential rule mining task applied to hacker adoption prediction. Section IV presents our experiments and results. Section V shows some related work. Finally, Section VI concludes the work.