IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. *, NO. *, MONTH 2023 1 Benchmarking Class Incremental Learning in Deep Learning Traffic Classification Giampaolo Bovenzi, Alfredo Nascita, Lixuan Yang, Alessandro Finamore, Giuseppe Aceto, Domenico Ciuonzo, Senior Member, IEEE, Antonio Pescap´ e, Senior Member, IEEE, and Dario Rossi, Senior Member, IEEE Abstract—Traffic Classification (TC) is experiencing a renewed interest, fostered by the growing popularity of Deep Learn- ing (DL) approaches. In exchange for their proved effectiveness, DL models are characterized by a computationally-intensive training procedure that badly matches the fast-paced release of new (mobile) applications, resulting in significantly limited efficiency of model updates. To address this shortcoming, in this work we systematically explore Class Incremental Learning (CIL) techniques, aimed at adding new apps/services to pre- existing DL-based traffic classifiers without a full retraining, hence speeding up the model’s updates cycle. We investigate a large corpus of state-of-the-art CIL approaches for the DL-based TC task, and delve into their working principles to highlight relevant insight, aiming to understand if there is a case for CIL in TC. We evaluate and discuss their performance varying the number of incremental learning episodes, and the number of new apps added for each episode. Our evaluation is based on the publicly available MIRAGE19 dataset comprising traffic of 40 popular Android applications, fostering reproducibility. Despite our analysis reveals their infancy, CIL techniques are a promising research area on the roadmap towards automated DL-based traffic analysis systems. Index Terms—Class-incremental learning, Deep Learning, Mo- bile Applications, Traffic Classification. I. I NTRODUCTION T RAFFIC CLASSIFICATION (TC) is at the core of any network traffic monitoring system, and a pillar for traffic management, cybersecurity, quality-of-experience monitoring, and other strategic activities for network operators. It is also a mature research topic, with many surveys on the subject [1–4]. From a chronological standpoint, we can categorize TC literature into two “waves”. The first wave ignited in the early 2000s, and centered around the use of Machine Learning (ML) methods using per-packet (e.g., packet size, packets inter-arrival time) or per-flow (e.g., total bytes, packets, ports) features as input, targeting the classification of a handful of applications. Several works demonstrated that even when just a few packets of a flow were observed, the classification was accurate [5–7], and could be sustained at line-rate speed [8]— “early” TC was born. Manuscript received on 15th November 2022; revised on 16th March 2023 and 5th June 2023; accepted on 15th June 2023. G. Bovenzi, A. Nascita, G. Aceto, D. Ciuonzo and A. Pescap´ e are with the Department of Electrical Engineering and Information Technologies (DIETI) at University of Naples Federico II, Italy. E-mail: {name.surname}@unina.it. L. Yang, A. Finamore and D. Rossi are with Huawei Technology France. E-mail: {name.surname}@huawei.com. However, over the years, early TC has become increasingly challenging. The main causes have been the growth in the adoption of traffic encryption, the extreme dynamism of Inter- net traffic due to new usage patterns, and the heterogeneity of devices connecting to the Internet, especially when considering mobile ones (with ecosystems of tools that ease the installation of new apps and their updates) [9]. The need for more evolved technologies for TC has been answered with Deep Learning (DL) techniques, propelled by the success these have shown in the field of Computer Vision (CV). Hence, a new wave of interest in TC started, producing several proposals of DL- based classifiers using as input either raw payload bytes or the same traffic features discovered during the first wave [9–14]. Despite the significant efforts spent and improvements achieved in the last years, all TC methodologies in the literature focus on “static” scenarios, i.e. where model updates are not contemplated. In other words, the aforementioned pool of techniques focuses only on the problem of creating the most accurate classifier given a dataset where (a) the set of classes (e.g., applications or services) and (b) the characterizing properties/fingerprint for each class are both immutable. This evidently clashes with the nature of TC, and is a limitation of current ML/DL-based TC methodologies that implicitly discourage model updates. Indeed, TC systems based on literature approaches use amend and retrain policies: to add new applications or new traffic behavior to a model the designer needs to (i) create a new training set (or expand the existing one), and (ii) train a new model from scratch—model updates are not incremental. On the contrary, the dynamic nature of Internet traffic calls for the adoption of continuous data pipelines to adapt models based on traffic changes. Accordingly, to closely track the network traffic landscape and sustain the required traffic monitoring operations, an effective TC system should support continuous model updates, as sketched in Fig. 1. Incremental Learning (IL), also known as continuous or online learning [15], is a discipline studying how to update models to accommodate the new knowledge required to per- form well the target task (e.g., a new class needs to be added to a classifier). This fits well the practical TC needs and thus a few traffic classifiers apt to incremental updates have been designed according to this philosophy. While the change of class characteristics (and thus the need to update the model related to known classes) seems more understood [16, 17], the need to progressively add new network apps/services to available classifiers has been considered only recently [18–21],