NETHIC: A System for Automatic Text Classiﬁcation using Neural Networks and Hierarchical Taxonomies Andrea Ciapetti 1 , Rosario Di Florio 1 , Luigi Lomasto 1 , Giuseppe Miscione 1 , Giulia Ruggiero 1 and Daniele Toti 1,2 a 1 Innovation Engineering S.r.l., Rome, Italy 2 Department of Sciences, Roma Tre University, Rome, Italy Keywords: Machine Learning, Neural Networks, Taxonomies, Text Classiﬁcation. Abstract: This paper presents NETHIC, a software system for the automatic classiﬁcation of textual documents based on hierarchical taxonomies and artiﬁcial neural networks. This approach combines the advantages of highly- structured hierarchies of textual labels with the versatility and scalability of neural networks, thus bringing about a textual classiﬁer that displays high levels of performance in terms of both effectiveness and efﬁciency. The system has ﬁrst been tested as a general-purpose classiﬁer on a generic document corpus, and then applied to the speciﬁc domain tackled by DANTE, a European project that is meant to address criminal and terrorist- related online contents, showing consistent results across both application domains. 1 INTRODUCTION With the increasing use of social networks and with the digitalization of governmental structures, com- puter scientists are facing new challenges and needs. Usually, ofﬁcial communications and documentations are stored in the form of electronic textual docu- ments. The rest of the personal communications ex- changed by single individuals is represented by chat messages, tweets, e-mails, blog entries, etc. This leads to an increased volume of textual information that may consequently bring about increasing confu- sion and hinder the effectiveness of the communica- tion itself. Understanding the subject category each data item falls into and the topics discussed has be- come paramount for an effective management and analysis of this deluge of information. Indeed, dur- ing the latest years, signiﬁcant effort and consider- able resources have been spent to satisfy this need within the context of governmental and commercial projects (Dalal and Zaveri, 2011). One of the poten- tial techniques to be used in this regard is the auto- matic text classiﬁcation, which falls into the category of supervised machine learning tasks. Such a process is meant to automatically assign a set of pre-deﬁned classes by using a machine learning technique (Sebas- tiani, 2002). This paper describes NETHIC, an auto- a https://orcid.org/0000-0002-9668-6961 matic text classiﬁcation system based on a hierarchi- cal taxonomy and artiﬁcial neural networks (ANNs). Taxonomies represent knowledge in a structured and human-readable manner. Their hierarchical structure enables an efﬁcient and automated content classiﬁ- cation. (Wetzker and et al., 2008). Artiﬁcial neural networks, on the other hand, have some interesting properties that make this family of machine learn- ing algorithms very appealing when facing difﬁcult pattern-discovery tasks. This combined approach is especially useful when a large amount of data is used during the training phase, and can be easily imple- mented in parallel architectures (i.e., with multi-core processors or systems with dedicated GPUs). This may drastically reduce the processing time compared to other kinds of algorithms, while achieving similar results in terms of effectiveness (Hermundstad et al., 2011). In this work, the NETHIC system is detailed, showing how it displays a signiﬁcant level of perfor- mance by using different taxonomies. First, a generic taxonomy is used in order to obtain a general-purpose text classiﬁer. Then, a speciﬁc taxonomy to tackle domain-speciﬁc texts and concepts from the DANTE Horizon 2020 project is introduced, so that it could be used to classify documents dealing with terrorist and criminal activities as well, the latter being the very objects of the DANTE project itself. This paper is structured as follows. In Section 2, 296 Ciapetti, A., Di Florio, R., Lomasto, L., Miscione, G., Ruggiero, G. and Toti, D. NETHIC: A System for Automatic Text Classiﬁcation using Neural Networks and Hierarchical Taxonomies. DOI: 10.5220/0007709702960306 In Proceedings of the 21st International Conference on Enterprise Information Systems (ICEIS 2019), pages 296-306 ISBN: 978-989-758-372-8 Copyright c  2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved