Semantic HMC: Ontology-described hierarchy maintenance in Big Data context Rafael Peixoto 1, 2 , Christophe Cruz 2 , Nuno Silva 1 1 GECAD - ISEP, Polytechnic of Porto, Porto, Portugal {rafpp,nps}@isep.ipp.pt 2 LE2I UMR6306, CNRS, Univ. Bourgogne Franche-Comté, F-21000 Dijon, France christophe.cruz@u-bourgogne.fr Abstract. One of the biggest challenges in Big Data is the exploitation of Value from large volumes of data that are constantly changing. To exploit value, one must focus on extracting knowledge from these Big Data sources. To extract knowledge and value from unstructured text we propose using a Hierarchical Multi-Label Classification process called Semantic HMC that uses Ontologies to describe the predictive model including the label hierarchy and the classifica- tion rules. To not overload the user, this process automatically learns the ontol- ogy-described label hierarchy from a very large set of text documents. This pa- per aims to present a maintenance process of the ontology-described label hier- archy relations with regards to a stream of unstructured text documents in the context of Big Data without relearn all the hierarchy. Keywords. Maintenance, multi-label classification, hierarchy induction, ontol- ogy, machine learning 1 Introduction The exponential growth of the amount of data available on the web requires new forms of processing to enable enhanced decision making, insight discovery and opti- mization. The term of Big Data is mainly used to describe datasets that cannot be processed using traditional tools. To extract knowledge from Big Data sources we propose to use a Semantic HMC process [1, 2] that is capable of Hierarchically Multi-Classify a large Variety and Volume of unstructured data items. Hierarchical Multi-Label Classification (HMC) is the combination of Multi-Label classification and Hierarchical classification [13]. The Semantic HMC process is unsupervised such that no previous labelled examples or enrichment rules to relate the data items with the labels are required. The label hierar- chy and the enrichment rules are automatically learned from the data through scalable Machine Learning techniques. The automatic concept (label) hierarchy extraction from unstructured documents is not a trivial process and proper techniques for document analysis and representation are required. In the context of Big Data, this task is even more challenging due to Big