Network Traffic Classification based on Class Weight based K-NN Classifier (CWK-NN) Mohamad Hijazi School of Arts and Sciences Lebanese International University Nabatieh, Lebanon MohamadOsamaHijazi@Gmail.com Jawad Khalife School of Arts and Sciences Lebanese International University Beirut, Lebanon jkhalife.khalife@liu.edu.lb Hussein Al-ghor Faculty of Technology Lebanese University Saida, Lebanon Hussein.ghor@ul.edu.lb Jesus Diaz Verdejo School of IT and Telecom. Eng. University of Granada Granada, Spain jedv@ugr.es Abstract— Network traffic identification is the first and most important step in network management and security. Numerous methods introduced by researchers. One solution depends on processing both packet header and payload, which is costly from both time and processing aspects. Another solution depends on the flow of statistical information such as packet header length, flow duration. Blind classifiers are not accurate yet very fast and do not violate privacy. Machine learning fills the gap between accuracy and time by using the blind classifier method and comparing the results with grand truth then adapting and increasing the accuracy. K-NN is used widely for its effectiveness and simplicity. However, a major drawback of K-NN is its dependency on the training set, being a lazy classification algorithm with no classification model to build. In this work, we aim first at assessing the KNN algorithm in traffic classification. Then we introduce a new deficiency, related to the training samples distribution in the n-dimensional space we measure and propose an enhancement for K-NN adapting to the new problem and outperforming native K-NN classifier. We weight the classes, not the instance, based on the intersections of class clusters in the dataset. Finally, we propose a new Class Weight based K-NN Classifier (CWK- NN), an enhanced K-NN algorithm that can easily adapt to the newly explored training set deficiency. Keywords— K-NN, weighted K-NN, traffic classification, computer network, traffic identification, training dataset I. INTRODUCTION The ability to identify network applications is centric to many network management and security tasks, including quality of service assignment, traffic engineering, content-dependent pricing, resource allocation, traffic shaping, and others. With the proliferation of applications, many of them using different kinds of obfuscation, traditional port-based classification has long become obsolete. Numerous methods were proposed for traffic classification as in [1] in the last decade. These methods have different characteristics at many levels, including the analyzed input, the applied techniques, and the classified target objects. Deciding upon which classification features to use is a strategic choice for any traffic classifier. Ideally, Deep packet inspection as in [2] and [3], or DPI, evaluates the data part and the header of a packet that is transmitted through an inspection point. DPI goes beyond examining IP packet headers, therefore, it raises many privacy concerns and is not applicable when the traffic is encrypted or tunneled. However, DPI techniques are considered in the literature as the most accurate techniques and are used therefore as reference classifiers to build the Ground Truth or reference results. On the other hand, blind classifiers do not inspect the payload and have the potential ability to deal with these obstacles, at the expense of an acceptable sacrifice in accuracy. However, less accurate, the so-called blind methods are preferred in most environments because they guarantee the users’ privacy, have the potential to classify encrypted communications and usually require less computational power. Most of these techniques are based on traffic attributes at the network and transport layers, such as packet sizes and inter- arrival times. Due to the problem dimensionality, Machine Learning (ML) techniques can be used in the classification context. ML classification is considered an instance of supervised learning, i.e., learning where a training set of 105 Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).