Padding and fragmentation for masking packet length statistics Alfonso Iacovazzi and Andrea Baiocchi Dept. of Information Engineering, Electronics and Telecommunications (DIET), University of Roma Sapienza, Via Eudossiana 18, 00184 Roma, Italy iacovazzi@infocom.uniroma1.it,andrea.baiocchi@uniroma1.it Abstract. We aim at understanding if and how complex it is to ob- fuscate traﬃc features exploited by statistical traﬃc ﬂow classiﬁcation tools. We address packet length masking and deﬁne perfect masking as an optimization problem, aiming at minimizing overhead. An explicit eﬃcient algorithm is given to compute the optimum masking sequence. Numerical results are provided, based on measured traﬃc traces. We ﬁnd that fragmenting requires about the same overhead as padding does. 1 Introduction In this work we investigate protection of privacy against traﬃc analysis (see [1][2]) and, at the same time, how much eﬀort is to be devoted to fool traﬃc analysis tools. Some recent works [4] [5] have made a careful analysis of the in- formation conveyed by the various features used to classify diﬀerent application classes: thier conclusion is that packet lengths are most valuable to traﬃc clas- siﬁers at application level. Moreover, in [3] Authors prove experimentally they are capable of getting partial transcript of encrypted VoIP transactions, by ex- ploiting also statistical information leaked by message lengths. This is the reason why we investigate how packet length information can be concealed to statistical traﬃc analysis, besides ciphering. We term this packet length masking. In [6] the Authors propose a technique for changing the packet lengths by optimally morphing one class of traﬃc to look like another class. They make use of convex optimization techniques to modify the packet lengths in order to get the minimum introduced overhead; however they do not account for packet fragmentation overhead, for residual correlations among masked packet lengths. Shui Yu et al. [7] implement a strategy to introduce dummy packet padding into a ﬂow in order to guarantee perfect anonymity on web browsing. They replace dummy packets with prefetched data in order to solve problems of extra delay and additional cost of bandwidth due to the packet padding. We deﬁne formally the traﬃc classiﬁcation problem assess the achievable performance bounds for a masking algorithm, by deﬁning the optimal solution to the stated masking problem. We show with many numerical examples based on real traﬃc traces that fragmenting entails a marginal beneﬁt as to overhead with respect to much simpler approaches where only padding is used.