Unveiling MIMETIC: Interpreting Deep Learning
Traffic Classifiers via XAI Techniques
Alfredo Nascita, Antonio Montieri, Giuseppe Aceto, Domenico Ciuonzo, Valerio Persico, Antonio Pescapè
University of Napoli “Federico II” (Italy)
a.nascita@studenti.unina.it, {antonio.montieri, giuseppe.aceto, domenico.ciuonzo, valerio.persico, pescape}@unina.it
Abstract—The widespread use of powerful mobile devices has
deeply affected the mix of traffic traversing both the Internet
and enterprise networks (with bring-your-own-device policies).
Traffic encryption has become extremely common, and the quick
proliferation of mobile apps and their simple distribution and
update have created a specifically challenging scenario for traffic
classification and its uses, especially network-security related
ones. The recent rise of Deep Learning (DL) has responded to
this challenge, by providing a solution to the time-consuming
and human-limited handcrafted feature design, and better clas-
sification performance. The counterpart of the advantages is the
lack of interpretability of these black-box approaches, limiting
or preventing their adoption in contexts where the reliability
of results, or interpretability of polices is necessary. To cope
with these limitations, eXplainable Artificial Intelligence (XAI)
techniques have seen recent intensive research. Along these lines,
our work applies XAI-based techniques (namely, Deep SHAP) to
interpret the behavior of a state-of-the-art multimodal DL traffic
classifier. As opposed to common results seen in XAI, we aim at a
global interpretation, rather than sample-based ones. The results
quantify the importance of each modality (payload- or header-
based), and of specific subsets of inputs (e.g., TLS SNI and TCP
Window Size) in determining the classification outcome, down
to per-class (viz. application) level. The analysis is based on a
publicly-released recent dataset focused on mobile app traffic.
Index Terms—traffic classification; encrypted traffic; explain-
able artificial intelligence; deep learning; multimodal learning.
I. I NTRODUCTION
The knowledge of the mix of traffic traversing a net-
work is instrumental to several management activities: Traffic
Classification (TC) has a key role in defining a “normal”
traffic profile for the purpose of anomaly detection, or to
extracting (or inferring) fingerprints for intrusion detection
and attack identification. Moreover, TC can be also exploited
for defining technical boundaries for censorship enforceability,
and assessing the effectiveness of surveillance and blocking
countermeasures. For these reasons TC has seen consistent
research and field adoption along the years, and is now seeing
a renewed blossoming of interest due to recent evolution of
network usage. Indeed, the widespread availability of well-
equipped smartphones has impacted both the Internet and
enterprise networks (due to bring-your-own-device policies),
presenting a highly dynamic and extensively encrypted mix of
traffic. On the other hand, new powerful Artificial Intelligence
techniques (namely Deep Learning, “DL” in the following)
have become available to face the new classification chal-
lenges. DL approaches are characterized by a fully-automated
feature extraction phase (with reduced need of human experts
in the loop) and a greater ability of learning from huge
volumes of data, that provides better performance than the
traditional Machine Learning (ML) approaches.
The highly desirable characteristics of DL come at the cost
of lack of interpretability of their results, as the black-box
nature of DL techniques hides the reason behind specific
classification outcomes. This impacts the understanding of
classification errors and the evaluation of the resilience against
adversarial manipulation of traffic to impair identification.
Moreover, by understanding the behavior of the learned model,
performance enhancements can be pursued with much more
focused and efficient research, compared with a less-informed
exploration of the (typically huge) hyper-parameters space. In
fact, DL approaches keep naturally hidden the answers to basic
questions like “which parts of a complex architecture mostly
contribute to the final decision?”, “which specific fields,
packets, protocols are the most important in the classification
process?”, or “which ones are responsible for classification
errors or circumvention?”.
The field of eXplainable Artificial Intelligence (XAI) con-
stitutes the answer to these needs, as it provides approaches
and techniques able to relate the structure of the model and
the input to the respective classification outcome, partially
revealing the (former) completely black box. The adoption of
DL and (consequently) of XAI is relatively new, especially in
the field of network traffic classification: with this work we
contribute to this step forward in the understanding of DL-
based network traffic classifiers.
To this aim, we perform the behavior interpretation of
a state-of-the-art DL architecture for TC we recently pro-
posed [1], analyzing the relative importance of inputs at fine
grain (i.e. per-class) in the challenging task of classifying
mobile apps. More specifically, we apply state-of-the-art XAI
tools (namely, Deep SHAP [2]) to quantify and understand
the importance of payload-derived and header-based inputs,
further deepening the analysis to specific subsets of the inputs
(i.e., TLS-SNI in the payload, and TCP Window Size and
Payload Length, and packet Inter-Arrival Time and Direction
for the header-based). To perform our experimental evaluation,
we leverage the public traffic dataset MIRAGE-2019 that
focuses on mobile-app traffic and is human-generated [3].
The paper is organized as follows. Section II surveys first 978-1-7281-5684-2/20/$31.00 ©2021 IEEE