IAES International Journal of Artificial Intelligence (IJ-AI) Vol. 11, No. 3, September 2022, pp. 1175~1183 ISSN: 2252-8938, DOI: 10.11591/ijai.v11.i3.pp1175-1183 1175 Journal homepage: http://ijai.iaescore.com Features analysis of internet traffic classification using interpretable machine learning models Erick A. Adje 1 , Vinasetan Ratheil Houndji 2 , Michel Dossou 3 1 Ecole Doctorale des Sciences de l’Ingénieur, Université d’Abomey-Calavi, Abomey-Calavi, Bénin 2 Institut de Formation et de Recherche en Informatique, Université d’Abomey-Calavi, Abomey-Calavi, Bénin 3 Ecole Polytechnique d’Abomey-Calavi, Université d’Abomey-Calavi, Abomey-Calavi, Bénin Article Info ABSTRACT Article history: Received Jul 16, 2021 Revised Mar 14, 2022 Accepted Apr 12, 2022 Internet traffic classification is a fundamental task for network services and management. There are good machine learning models to identify the class of traffic. However, finding the most discriminating features to have efficient models remains essential. In this paper, we use interpretable machine learning algorithms such as decision tree, random forest and eXtreme gradient boosting (XGBoost) to find the most discriminating features for internet traffic classification. The dataset used contains 377,526 traffics. Each traffic is described by 248 features. From these features, we propose a 12-feature model with an accuracy of up to 99.76%. We tested it on another dataset with 19626 flows and obtained 98.40% of accuracy. This shows the efficiency and stability of our model. Also, we identify a set of 14 important features for internet traffic classification, including two that are crucial: port number (server) and minimum segment size (client to server). Keywords: Classification algorithm Internet traffic Machine learning Traffic classification Traffic internet discriminators This is an open access article under the CC BY-SA license. Corresponding Author: Vinasetan Ratheil Houndji Institut de Formation et de Recherche en Informatique, Université d’Abomey-Calavi 01 BP 526 Abomey-Calavi-Bénin Email: ratheil.houndji@uac.bj 1. INTRODUCTION Internet traffic has increased significantly over the last decade due to new technologies, industries, and applications. It becomes an interesting challenge for network management. Accurate classification of internet traffic is fundamental for better management of network traffic, from monitoring to security, from the quality of service (QoS) to the provision of the right resource. Automatic traffic classification is an automated process that classifies network traffic according to various parameters (e.g., port number, protocol, and the number of packets exchanged) into various traffic classes (e.g., web, multimedia, database, e-mail, games, and file transfer). It consists of examining internet protocol (IP) packets to extract some specific characteristics to answer some questions related to their origins such as the content or the user’s intentions. Typically, it deals with packet flows defined as sequences of packets uniquely identified by the source IP address, source port, destination IP address, destination port and protocol used at the transport layer, and many others. While research on traffic classification is quite specific, the author’s motivations are not always the same [1]. Some approaches classify traffic according to its category i.e., whether the traffic represents file transfer, peer to peer (P2P), games, multimedia, web, or attacks [2]–[8]. Others try to identify the protocol involved at the application level such as file transfer protocol (FTP), hypertext transfer protocol (HTTP), secure shell (SSH), Telnet [9]–[14]. One particular study reviewed current traffic classification methods by classifying them into five categories: statistics-based, correlation-based, behaviour-based, payload-based,