Intelligenza Artificiale 14 (2020) 103–114 DOI 10.3233/IA-190038 IOS Press 103 ActorNode2Vec: An Actor-based solution for Node Embedding over large networks Gianfranco Lombardo and Agostino Poggi Department of Engineering and Architecture, University of Parma, Italy Abstract. The application of Machine Learning techniques over networks, such as prediction tasks over nodes and edges, is becoming often crucial in the analysis of Complex systems in a wide range of research fields. One of the enabling technologies in that sense is represented by Node Embedding, which enables us to learn features automatically over the network. Among the different approaches proposed in the literature, the most promising are DeepWalk and Node2Vec, where the embedding is computed by combining random walks and neural language models. However, characteristic limitations with these techniques are related to memory requirements and time complexity. In this paper, we propose a distributed and scalable solution, named ActorNode2vec, that keeps the best advantages of Node2Vec and overcomes the limitations with the adoption of the actor model to distribute the computational load. We demonstrate the efficacy of this approach with a large network by analyzing the sensitivity of walk length and number of walks parameters and make a comparison also with Deep walk and an Apache Spark distributed implementation of Node2Vec. Results show that with ActorNode2vec computational times are drastically reduced without losing embedding quality and overcoming memory issues. Keywords: Network science, embedding, node embedding, Node2vec, actodes, distributed systems, data mining, complex systems, actor model 1. Introduction In a wide range of disciplines it is possible to find real systems often characterized by heterogeneous or similar entities that interact with each other in a complex way: In fact, these so-called Complex sys- tems are pervasive in several research fields, such as Sociology, Biology, Genetics, Physics, Computer sci- ence and Finance and their analysis for knowledge discovery is still challenging. Complex system anal- ysis involves often the use of graphs (or networks) to model the behavior of the system with the basic idea of representing entities as nodes and their interactions and dynamics as (un)directed edges. For decades the study of graph-data has been limited to analysis of the network topology with structural metrics that are Corresponding author: Gianfranco Lombardo, Department of Engineering and Architecture, University of Parma, Italy. E-mail: gianfranco.lombardo@unipr.it. capable of extracting connectivity patterns among the system components. More recently, with the progress of Machine Learning techniques, it is emerged the idea of taking advantages from this kind of struc- tures, also to perform prediction tasks or clustering. For example in [1] the authors modeled the interac- tion between proteins as a network, with the aim of automatic predicting a correct label for each protein describing its functionalities. In [2] the authors uses a temporal network to model the US stock market in order to discover correlations among the dynam- ics of stocks’ cluster and to predict economic crises. In [3] the authors analyze a social group of patients to extract new knowledge about their emotional state and disease temporal pattern by modeling them in two attributed networks: an interaction network and a friendship network. However, the application of Machine Learning directly on graph-data is made it difficult because of the necessary manually feature 1724-8035/20/$35.00 © 2020 – IOS Press and the authors. All rights reserved