Low complexity, high performance neuro-fuzzy
system for Internet traffic flows early classification
Antonello Rizzi, Silvia Colabrese, Andrea Baiocchi
DIET - University of Roma “Sapienza” - Via Eudossiana 18 - 00184 Roma, Italy
Email: antonello.rizzi@diet.uniroma1.it,andrea.baiocchi@uniroma1.it
Abstract—Traffic flow classification to identify applications and
activity of users is widely studied both to understand privacy
threats and to support network functions such as usage policies
and QoS. For those needs, real time classification is required
and classifier’s complexity is as important as accuracy, especially
given the increasing link speeds also in the access section of
the network. We propose the application of a highly efficient
classification system, specifically Min-Max neurofuzzy networks
trained by PARC algorithm, showing that it achieves very high
accuracy, in line with the best performing algorithms on Weka, by
considering two traffic data sets collected in different epochs and
places. It turns out that required classification model complexity
is much lower with Min-Max networks with respect to SVM
models, enabling the implementation of effective classification
algorithms in real time on inexpensive platforms.
Index Terms—Traffic flow classification; machine learning;
neurofuzzy networks; features selection; classifier complexity
I. I NTRODUCTION
Traffic Analysis is the main technique used to exploit
information leakage offered by observable features of packet
traffic in a ciphered channel and infer as much as possible
about the content of the traffic flow. In [1], Raymond et al.
provide an overview of all possible attacks that can be carried
out using traffic analysis. A large body of literature has grown
on the problem of application layer traffic classification by
means of traffic analysis and several methods of classification
based on statistical analysis of traffic patterns and machine-
learning techniques have been proposed and analyzed. For gen-
eral reviews see [2][3][4][5][6][7]. Besides being an obvious
attack on privacy, Traffic Classification can have useful and
legitimate goals, as pointed out in [2], such as: identification
of user activities in order to enforce traffic filtering and
to support quality of service mechanisms; development of
diagnostic tools for anomalous network behaviors, in order to
identify possible worms or Denial of Service (DoS) attacks.
In [8] traffic classification is deemed as a key component of
automated QoS management.
There is a vast literature presenting techniques that can
identify traffic classes based solely on the use of traffic and
packet features that remain observable even after encryption,
e.g., see [9], [10] for general approaches to traffic flow
classifications, [11] for the identification of encrypted Skype
traffic within an aggregate traffic stream, [12], [13], [14] for
classification of flows carried inside SSH connections, [15],
[16] for classification of encrypted web pages among a set of
´
¨
pre-defined alternatives.
Among recent works on traffic classification, Li et al. [17]
develop a Semi-Supervised Support Vector Machine (SVM)
based on flow statistics, to identify and classify network ap-
plication. They use a radial basis function (RBF) as the kernel
function of the SVM and the co-training as a semi-supervised
technique. The algorithm is implemented by procedures based
on Weka 3.7 [18]. Wang et al. in [19] propose a token-based
approach that uses machine learning techniques on statistical
features of traffic. They first look for common substrings in
the first N bytes of the flow payload for each class, and then
apply a features selection algorithm to reduce the size of the
token set. Their proposal achieves high classification accuracy
with low computational complexity, but it requires payloads
and it is not suitable for encrypted flows. In [20] Szabo and
Szule propose a novel framework that takes an incremental
approach, whereby new features are exploited as packets of a
flow are observed and also different flows are correlated, so
adapting the abstraction level of the traffic analysis to different
purposes. In case of strictly real time, single flow classification,
no special advantage is brought about by this approach.
We propose the use of a neuro-fuzzy machine learning
system, specifically Min-Max networks trained by PARC
algorithm, for real time traffic flow classification relying only
on some simple features extracted from the first few packets
of each flow. Since observed features are not impaired by
encryption, this classification technique can be applied when-
ever it is possible to delineate individual flows. Key points
shown in this paper are: i) accuracy as high as 99% and
anyway above 90% can be achieved even with only few initial
packets and in any case by using no more than the first ten
packets of a flow; ii) complexity of the classification models is
sensitively less than best performing ones, notably the Support
Vector Machines (SVM). It is important to underline that a
low structural complexity is a fundamental requirement in
classification model synthesis, enabling the implementation of
effective flow classification systems in real time on inexpensive
platforms, such as FPGA based embedded systems.
The rest of the paper is organized as follows. In Section
II a synthetic description of feature extraction procedure from
traffic data set is given. Section III gives a brief account of
the Min-Max classification neurofuzzy network. Performance
results are introduced and discussed in Section IV. Final
remarks are given in Section V.
978-1-4673-2480-9/13/$31.00 ©2013 IEEE 77