24 IEEE Network • January/February 2016 0890-8044/16/$25.00 © 2016 IEEE D ata mining is the process of extraction of useful infor- mation from a set of data, or dataset [1]. A dataset usually contains a large amount of data, popularly referred to as “big data.” Data mining techniques can be effectively applied to any discipline, including physics, biology, engineer- ing, finance, environmental sciences, and so on. Communi- cation networks produce large amounts of data that can be used by network operators to manage network operation and to support the design of new networks. As a consequence, the area of communication networks engineering can benefit from data mining solutions. Next to the problem definition phase, which identifies the purpose of the activity and the expected results, the data min- ing process includes the data collection phase, the data pre- processing phase, the model building phase, and the model evaluation phase (Fig. 1). The data collection phase fore- sees the gathering of useful data, its storage, and its online updating. The stored historical data are the only source of information for driving the whole data mining process, thus providing results according to the current data. The data preprocessing phase involves data management and trans- formation, discretization, outlier and missing value manage- ment, dimensionality reduction, data normalization, feature extraction, and feature selection. The model building phase concerns the development of learning methods for descrip- tive and predictive data analysis using data collected from a real network or extracted from network simulations. The model evaluation phase allows the performance of the devel- oped data mining models to be assessed. Finally, the results of the overall data mining process are used with the aim of optimizing the system, gaining knowledge about the system, or simply visualizing its behavior. In the rest of the article we focus on the model building phase, discussing both descriptive and predictive data mining methods for communication networks control. The task of communication networks control aims to sup- port the efficient operation of communication networks. On one hand, depending on the specific problem, this can be achieved optimizing the network performance using a priori information and methods based on, for example, game theory, Markov decision processes, genetic algorithms, and simulated annealing. On the other hand, learning algorithms for data mining allow following and understanding the network behav- ior so that control functions and parameters can be updated during network operation to achieve optimal performance in any real condition. In principle, using real-time performance measurement tools, we can infer an association rule between performance, network status, and network control parameters, and hence learn the set of parameters that probabilistically maximize the network performance in any given network condition. In this article we provide a structured description concern- ing the applications of data mining techniques to communica- tion networks control (optimization and management), then review recent works on this topic, and finally provide guide- lines for future applications. The article is organized as follows. We introduce data mining methods for descriptive and predictive data analysis. Regarding descriptive methods, we discuss association rule mining, clustering, and sequential pattern mining algorithms, respectively. Regarding predictive methods, we discuss classi- fication and regression algorithms. We provide guidelines for future applications of data mining methods in communication networks control. Finally, conclusions are drawn. Abstract The control of communication networks is an important aspect from both the service provider and user points of view. There are several approaches to communication network control including game theory, genetic algorithms and Markov decision processes. Data mining methods have been successfully used to discover optimized solutions to this problem, and have the capability to learn the network behavior under different network conditions and during operation so that complete knowl- edge of the network behavior is not required a priori. This article identifies the concepts behind the idea of using data mining for communication network control, provides a structured survey of the results in this area, and discusses the guidelines for future applications. Data Mining Algorithms for Communication Networks Control: Concepts, Survey and Guidelines Mauro De Sanctis, Igor Bisio, and Giuseppe Araniti Mauro De Sanctis is with the University of Roma Tor Vergata. Igor Bisio is with the University of Genova. Giuseppe Araniti is with the University “Mediterranea” of Reggio Calabria.