International Journal of Electrical and Computer Engineering (IJECE) Vol. 8, No. 4, August 2018, pp. 2521~2530 ISSN: 2088-8708, DOI: 10.11591/ijece.v8i4.pp2521-2530 2521 Journal homepage: http://iaescore.com/journals/index.php/IJECE Impact of Packet Inter-arrival Time Features for Online Peer-to-Peer (P2P) Classification Bushra Mohammed Ali Abdalla 1 , Mosab Hamdan 2 , Mohammed Sultan Mohammed 3 , Joseph Stephen Bassi 4 , Ismahani Ismail 5 , Muhammad Nadzir Marsono 6 1,2,3,5,6 Department of Electronic and Computer Engineering, Faculty of Electronic Engineering, Universiti Teknologi Malaysia, 81310, Johor Bahru, Malaysia 4 Department of Computer Engineering, Faculty of Engineering, University of Maiduguri, Borno state, Nigeria Article Info ABSTRACT Article history: Received Apr 12, 2018 Revised Jul 20, 2018 Accepted Jul 26, 2018 Identification of bandwidth-heavy Internet traffic is important for network administrators to throttle high-bandwidth application traffic. Flow features based classification have been previously proposed as promising method to identify Internet traffic based on packet statistical features. The selection of statistical features plays an important role for accurate and timely classification. In this work, we investigate the impact of packet inter-arrival time feature for online P2P classification in terms of accuracy, Kappa statistic and time. Simulations were conducted using available traces from University of Brescia, University of Aalborg and University of Cambridge. Experimental results show that the inclusion of inter-arrival time (IAT) as an online feature increases simulation time and decreases classification accuracy and Kappa statistic. Keyword: Features selection Machine learning Online features P2P Copyright © 2018 Institute of Advanced Engineering and Science. All rights reserved. Corresponding Author: Muhammad Nadzir Marsono, Department of Electronic and Computer Engineering, Faculty of Electronic Engineering, Universiti Teknologi Malaysia, 81310, Johor Bahru, Malaysia. Email: nadzir@fke.utm.my 1. INTRODUCTION Today, peer-to-peer (P2P) is as an architecture for sharing a wide range of media on the Internet. P2P traffic represents about 27% to 60% of the total Internet traffic, depending on geographic location [1], [2]. The high volume of P2P traffic is due to file sharing, video streaming, on-line gaming and other activities that client-server architecture cannot accomplish as fast or as efficient as the P2P architecture. Rapid progression of P2P traffic volume throughout the years have resulted in deteriorated network performance and congestion due to the high bandwidth consumption of P2P applications [3]. Therefore, traffic identification is required to improve traffic management. First generation P2P application traffic were relatively easy to be identified due to the use of fixed ports numbers. However, current P2P applications are able to circumvent port-based identification by using anonymous port numbers or port disguise [4], [2]. Besides, methods that rely on inspecting application payload signatures have also been proposed [5]. For privacy and impractical reasons, this method is ineffective. The effectiveness of the port-based and payload-based methods prompted the use of flow statistics as features for traffic identification. These strategies offer flexibility to detect P2P traffic compared to using signature-based and port-based methods. Several techniques have been proposed over the last two decades that focused on the attainable identification accuracy using several machine learning (ML) algorithms. However, the impact of exploring the effect of distinct sets of statistical features has not been researched in-depth. Work in [6] has reported that