Network Traffic Classification based on
Class Weight based K-NN Classifier (CWK-NN)
Mohamad Hijazi School of Arts and
Sciences Lebanese International
University Nabatieh, Lebanon
MohamadOsamaHijazi@Gmail.com
Jawad Khalife
School of Arts and Sciences
Lebanese International University
Beirut, Lebanon
jkhalife.khalife@liu.edu.lb
Hussein Al-ghor Faculty
of Technology Lebanese
University Saida,
Lebanon
Hussein.ghor@ul.edu.lb
Jesus Diaz Verdejo School of
IT and Telecom. Eng. University
of Granada
Granada, Spain
jedv@ugr.es
Abstract— Network traffic identification is the first and
most important step in network management and
security. Numerous methods introduced by researchers.
One solution depends on processing both packet header
and payload, which is costly from both time and
processing aspects. Another solution depends on the flow
of statistical information such as packet header length,
flow duration. Blind classifiers are not accurate yet very
fast and do not violate privacy. Machine learning fills the
gap between accuracy and time by using the blind
classifier method and comparing the results with grand
truth then adapting and increasing the accuracy. K-NN is
used widely for its effectiveness and simplicity. However,
a major drawback
of K-NN is its dependency on the training set, being a
lazy classification algorithm with no classification model
to build. In this work, we aim first at assessing the KNN
algorithm in traffic classification. Then we introduce a
new deficiency, related to the training samples
distribution in the n-dimensional space we measure and
propose an enhancement for K-NN adapting to the new
problem and outperforming native K-NN classifier. We
weight the classes, not the instance, based on the
intersections of
class clusters in the dataset. Finally, we propose a
new Class Weight based K-NN Classifier (CWK-
NN), an enhanced K-NN algorithm that can easily
adapt to the newly explored training set deficiency.
Keywords— K-NN, weighted K-NN, traffic classification,
computer network, traffic identification, training dataset
I. INTRODUCTION
The ability to identify network applications is centric to many
network management and security tasks, including quality of
service assignment, traffic engineering, content-dependent
pricing, resource allocation, traffic shaping, and others. With
the proliferation of applications, many of them using different
kinds of obfuscation, traditional port-based classification has
long become obsolete.
Numerous methods were proposed for traffic classification as
in [1] in the last decade. These methods have different
characteristics at many levels, including the analyzed input,
the applied techniques, and the classified target objects.
Deciding upon which classification features to use is a
strategic choice for any traffic classifier. Ideally, Deep packet
inspection as in [2] and [3], or DPI, evaluates the data part
and the header of a packet that is transmitted through an
inspection point. DPI goes beyond examining IP packet
headers, therefore, it raises many privacy concerns and is not
applicable when the traffic is encrypted or tunneled.
However, DPI techniques are considered in the literature as
the most accurate techniques and are used therefore as
reference classifiers to build the Ground Truth or reference
results.
On the other hand, blind classifiers do not inspect the payload
and have the potential ability to deal with these obstacles, at
the expense of an acceptable sacrifice in accuracy. However,
less accurate, the so-called blind methods are preferred in
most environments because they guarantee the
users’ privacy, have the potential to classify encrypted
communications and usually require less computational
power.
Most of these techniques are based on traffic attributes at the
network and transport layers, such as packet sizes and inter-
arrival times. Due to the problem dimensionality, Machine
Learning (ML) techniques can be used in the classification
context. ML classification is considered an instance of
supervised learning, i.e., learning where a training set of
105
Copyright © 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).