24 IEEE Network • January/February 2016 0890-8044/16/$25.00 © 2016 IEEE
D
ata mining is the process of extraction of useful infor-
mation from a set of data, or dataset [1]. A dataset
usually contains a large amount of data, popularly referred
to as “big data.” Data mining techniques can be effectively
applied to any discipline, including physics, biology, engineer-
ing, finance, environmental sciences, and so on. Communi-
cation networks produce large amounts of data that can be
used by network operators to manage network operation and
to support the design of new networks. As a consequence, the
area of communication networks engineering can benefit from
data mining solutions.
Next to the problem definition phase, which identifies the
purpose of the activity and the expected results, the data min-
ing process includes the data collection phase, the data pre-
processing phase, the model building phase, and the model
evaluation phase (Fig. 1). The data collection phase fore-
sees the gathering of useful data, its storage, and its online
updating. The stored historical data are the only source of
information for driving the whole data mining process, thus
providing results according to the current data. The data
preprocessing phase involves data management and trans-
formation, discretization, outlier and missing value manage-
ment, dimensionality reduction, data normalization, feature
extraction, and feature selection. The model building phase
concerns the development of learning methods for descrip-
tive and predictive data analysis using data collected from
a real network or extracted from network simulations. The
model evaluation phase allows the performance of the devel-
oped data mining models to be assessed. Finally, the results
of the overall data mining process are used with the aim of
optimizing the system, gaining knowledge about the system,
or simply visualizing its behavior.
In the rest of the article we focus on the model building
phase, discussing both descriptive and predictive data mining
methods for communication networks control.
The task of communication networks control aims to sup-
port the efficient operation of communication networks. On
one hand, depending on the specific problem, this can be
achieved optimizing the network performance using a priori
information and methods based on, for example, game theory,
Markov decision processes, genetic algorithms, and simulated
annealing. On the other hand, learning algorithms for data
mining allow following and understanding the network behav-
ior so that control functions and parameters can be updated
during network operation to achieve optimal performance in
any real condition.
In principle, using real-time performance measurement
tools, we can infer an association rule between performance,
network status, and network control parameters, and hence
learn the set of parameters that probabilistically maximize the
network performance in any given network condition.
In this article we provide a structured description concern-
ing the applications of data mining techniques to communica-
tion networks control (optimization and management), then
review recent works on this topic, and finally provide guide-
lines for future applications.
The article is organized as follows. We introduce data
mining methods for descriptive and predictive data analysis.
Regarding descriptive methods, we discuss association rule
mining, clustering, and sequential pattern mining algorithms,
respectively. Regarding predictive methods, we discuss classi-
fication and regression algorithms. We provide guidelines for
future applications of data mining methods in communication
networks control. Finally, conclusions are drawn.
Abstract
The control of communication networks is an important aspect from both the service
provider and user points of view. There are several approaches to communication
network control including game theory, genetic algorithms and Markov decision
processes. Data mining methods have been successfully used to discover optimized
solutions to this problem, and have the capability to learn the network behavior
under different network conditions and during operation so that complete knowl-
edge of the network behavior is not required a priori. This article identifies the
concepts behind the idea of using data mining for communication network control,
provides a structured survey of the results in this area, and discusses the guidelines
for future applications.
Data Mining Algorithms for
Communication Networks Control:
Concepts, Survey and Guidelines
Mauro De Sanctis, Igor Bisio, and Giuseppe Araniti
Mauro De Sanctis is with the University of Roma Tor Vergata.
Igor Bisio is with the University of Genova.
Giuseppe Araniti is with the University “Mediterranea” of Reggio Calabria.