IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. *, NO. *, MONTH 2023 1 Benchmarking Class Incremental Learning in Deep Learning Trafﬁc Classiﬁcation Giampaolo Bovenzi, Alfredo Nascita, Lixuan Yang, Alessandro Finamore, Giuseppe Aceto, Domenico Ciuonzo, Senior Member, IEEE, Antonio Pescap´ e, Senior Member, IEEE, and Dario Rossi, Senior Member, IEEE Abstract—Trafﬁc Classiﬁcation (TC) is experiencing a renewed interest, fostered by the growing popularity of Deep Learn- ing (DL) approaches. In exchange for their proved effectiveness, DL models are characterized by a computationally-intensive training procedure that badly matches the fast-paced release of new (mobile) applications, resulting in signiﬁcantly limited efﬁciency of model updates. To address this shortcoming, in this work we systematically explore Class Incremental Learning (CIL) techniques, aimed at adding new apps/services to pre- existing DL-based trafﬁc classiﬁers without a full retraining, hence speeding up the model’s updates cycle. We investigate a large corpus of state-of-the-art CIL approaches for the DL-based TC task, and delve into their working principles to highlight relevant insight, aiming to understand if there is a case for CIL in TC. We evaluate and discuss their performance varying the number of incremental learning episodes, and the number of new apps added for each episode. Our evaluation is based on the publicly available MIRAGE19 dataset comprising trafﬁc of 40 popular Android applications, fostering reproducibility. Despite our analysis reveals their infancy, CIL techniques are a promising research area on the roadmap towards automated DL-based trafﬁc analysis systems. Index Terms—Class-incremental learning, Deep Learning, Mo- bile Applications, Trafﬁc Classiﬁcation. I. I NTRODUCTION T RAFFIC CLASSIFICATION (TC) is at the core of any network trafﬁc monitoring system, and a pillar for trafﬁc management, cybersecurity, quality-of-experience monitoring, and other strategic activities for network operators. It is also a mature research topic, with many surveys on the subject [1–4]. From a chronological standpoint, we can categorize TC literature into two “waves”. The ﬁrst wave ignited in the early 2000s, and centered around the use of Machine Learning (ML) methods using per-packet (e.g., packet size, packets inter-arrival time) or per-ﬂow (e.g., total bytes, packets, ports) features as input, targeting the classiﬁcation of a handful of applications. Several works demonstrated that even when just a few packets of a ﬂow were observed, the classiﬁcation was accurate [5–7], and could be sustained at line-rate speed [8]— “early” TC was born. Manuscript received on 15th November 2022; revised on 16th March 2023 and 5th June 2023; accepted on 15th June 2023. G. Bovenzi, A. Nascita, G. Aceto, D. Ciuonzo and A. Pescap´ e are with the Department of Electrical Engineering and Information Technologies (DIETI) at University of Naples Federico II, Italy. E-mail: {name.surname}@unina.it. L. Yang, A. Finamore and D. Rossi are with Huawei Technology France. E-mail: {name.surname}@huawei.com. However, over the years, early TC has become increasingly challenging. The main causes have been the growth in the adoption of trafﬁc encryption, the extreme dynamism of Inter- net trafﬁc due to new usage patterns, and the heterogeneity of devices connecting to the Internet, especially when considering mobile ones (with ecosystems of tools that ease the installation of new apps and their updates) [9]. The need for more evolved technologies for TC has been answered with Deep Learning (DL) techniques, propelled by the success these have shown in the ﬁeld of Computer Vision (CV). Hence, a new wave of interest in TC started, producing several proposals of DL- based classiﬁers using as input either raw payload bytes or the same trafﬁc features discovered during the ﬁrst wave [9–14]. Despite the signiﬁcant efforts spent and improvements achieved in the last years, all TC methodologies in the literature focus on “static” scenarios, i.e. where model updates are not contemplated. In other words, the aforementioned pool of techniques focuses only on the problem of creating the most accurate classiﬁer given a dataset where (a) the set of classes (e.g., applications or services) and (b) the characterizing properties/ﬁngerprint for each class are both immutable. This evidently clashes with the nature of TC, and is a limitation of current ML/DL-based TC methodologies that implicitly discourage model updates. Indeed, TC systems based on literature approaches use amend and retrain policies: to add new applications or new trafﬁc behavior to a model the designer needs to (i) create a new training set (or expand the existing one), and (ii) train a new model from scratch—model updates are not incremental. On the contrary, the dynamic nature of Internet trafﬁc calls for the adoption of continuous data pipelines to adapt models based on trafﬁc changes. Accordingly, to closely track the network trafﬁc landscape and sustain the required trafﬁc monitoring operations, an effective TC system should support continuous model updates, as sketched in Fig. 1. Incremental Learning (IL), also known as continuous or online learning [15], is a discipline studying how to update models to accommodate the new knowledge required to per- form well the target task (e.g., a new class needs to be added to a classiﬁer). This ﬁts well the practical TC needs and thus a few trafﬁc classiﬁers apt to incremental updates have been designed according to this philosophy. While the change of class characteristics (and thus the need to update the model related to known classes) seems more understood [16, 17], the need to progressively add new network apps/services to available classiﬁers has been considered only recently [18–21],