Robust Network Flow Classiﬁcation against Malicious Feature Manipulation Yupeng Li Dept. of Electrical and Computer Engineering University of Toronto Toronto, Canada yupeng.li@utoronto.ca Ben Liang Dept. of Electrical and Computer Engineering University of Toronto Toronto, Canada liang@ece.utoronto.ca Ali Tizghadam Technology Strategy and Business Transformation TELUS Communications Toronto, Canada ali.tizghadam@telus.com Abstract—Network ﬂow classiﬁcation is essential to proper provisioning of Quality of Service (QoS). Conventional machine-learning based ﬂow classiﬁcation methods assume reliable knowledge of the ﬂow features. However, in practice, malicious ﬂow generators can manipulate the ﬂow features to increase the likelihood of certain learning outcomes, e.g., in terms of the QoS requirement label. Training a classiﬁer that is robust to such feature manipulation is imperative. In this work, we present a study on robust ﬂow classiﬁcation against malicious feature manipulation. We leverage a detailed system model to capture the relation between the classiﬁer and malicious ﬂow generators and propose a Stackelberg- game based solution framework to train a robust classiﬁer. We conduct extensive experimentation using real-world traces. For ﬂows with manipulated features, the Stackelberg classiﬁer trained by our solution framework signiﬁcantly outperforms a non-robust classiﬁer that is oblivious to manipulation, achieving accuracy close to that of the non-robust classiﬁer on unmanipulated ﬂows. Furthermore, the Stackelberg classiﬁer on manipulated test ﬂows is no worse than the non-robust classiﬁer on unmanipulated ﬂows. I. I NTRODUCTION Network ﬂow classiﬁcation is crucial for network re- source management, especially to improve Quality of Ser- vice (QoS) [1]. Classical port-based or payload-based ap- proaches are severely ineffective, especially for encrypted trafﬁc [2]–[4]. A series of recent works have proposed methods that employ machine learning techniques and shown promising results [2]–[10]. These methods typically use only the observable ﬂow features, such as the minimum, mean, maximum, and standard deviation of packet lengths and packet inter-arrival times. A common assumption made in these methods is reli- able knowledge of the ﬂow feature values. However, this assumption may not hold, especially when malicious ﬂow generators exist. Such generators have a vested interest in the classiﬁcation outcome. They manipulate the features of their ﬂows to game the classiﬁer for the purpose of increas- ing the likelihood of outcomes favorable to themselves. For example, a malicious ﬂow generator can change the packet inter-arrival times and the packet size in a ﬂow in an attempt to disguise itself to evade being blocked [11], or to be prioritized for more network bandwidth so that the This work has been funded by grants from TELUS and the Natural Sciences and Engineering Research Council (NSERC) of Canada. ﬂow is completed faster. Though feature manipulation can incur a cost [12], [13], the overall beneﬁt to a malicious generator may be positive. Such malicious behavior can render conventional statistics-based methods ineffective. Speciﬁcally, malicious ﬂow generators may be able to manipulate the ﬂow features to best respond to the classiﬁcation model committed by the classiﬁer. Therefore, a ﬂow from a malicious generator can be misclassiﬁed, e.g., in terms of the QoS requirement level. For example, as explained in Sec. V, our experiments with real-world traces suggest that a classiﬁer that is oblivious to such malicious behavior can have a classiﬁcation accuracy down to below 40%. Thus, a classiﬁer that is robust to feature manipulation is imperative. To the best of our knowledge, none of the existing ﬂow classiﬁcation methodologies was designed against malicious feature manipulation. In this work, we study the open problem of robust ﬂow classiﬁcation. The task is to classify ﬂows into multiple classes corresponding to different QoS levels, aiming to map each ﬂow to its true required QoS level. For simplicity in this initial investi- gation, we consider the linear classiﬁcation model, which can be executed efﬁciently and is commonly used for ﬂow classiﬁcation in practice [14]. Our goal is to obtain a ﬂow classiﬁer that is robust to malicious manipulation. To obtain such a robust ﬂow classiﬁer is challenging. First, the feature manipulation of a malicious ﬂow generator is given as a best response to the classiﬁcation model. Thus, the presented features might be a function of the classiﬁ- cation model itself, which complicates the design space. Second, the features are manipulated after the classiﬁer commits to a model. Such ex ante model can hardly best respond to any malicious manipulation. Third, no training data with manipulated features are available for training the classiﬁer. In this work, we present a system model to capture trafﬁc ﬂows, classiﬁers, and feature manipulation. We propose a solution framework based on the Stackelberg game to train a robust network ﬂow classiﬁer (see Fig. 1), which we term the Stackelberg classiﬁer. The framework supposes that the ﬂow features can be manipulated during model training. The classiﬁer, after solving a carefully formulated multi- player Stackelberg game, commits to a classiﬁcation model