Padding and fragmentation for masking packet length statistics Alfonso Iacovazzi and Andrea Baiocchi Dept. of Information Engineering, Electronics and Telecommunications (DIET), University of Roma Sapienza, Via Eudossiana 18, 00184 Roma, Italy iacovazzi@infocom.uniroma1.it,andrea.baiocchi@uniroma1.it Abstract. We aim at understanding if and how complex it is to ob- fuscate traffic features exploited by statistical traffic flow classification tools. We address packet length masking and define perfect masking as an optimization problem, aiming at minimizing overhead. An explicit efficient algorithm is given to compute the optimum masking sequence. Numerical results are provided, based on measured traffic traces. We find that fragmenting requires about the same overhead as padding does. 1 Introduction In this work we investigate protection of privacy against traffic analysis (see [1][2]) and, at the same time, how much effort is to be devoted to fool traffic analysis tools. Some recent works [4] [5] have made a careful analysis of the in- formation conveyed by the various features used to classify different application classes: thier conclusion is that packet lengths are most valuable to traffic clas- sifiers at application level. Moreover, in [3] Authors prove experimentally they are capable of getting partial transcript of encrypted VoIP transactions, by ex- ploiting also statistical information leaked by message lengths. This is the reason why we investigate how packet length information can be concealed to statistical traffic analysis, besides ciphering. We term this packet length masking. In [6] the Authors propose a technique for changing the packet lengths by optimally morphing one class of traffic to look like another class. They make use of convex optimization techniques to modify the packet lengths in order to get the minimum introduced overhead; however they do not account for packet fragmentation overhead, for residual correlations among masked packet lengths. Shui Yu et al. [7] implement a strategy to introduce dummy packet padding into a flow in order to guarantee perfect anonymity on web browsing. They replace dummy packets with prefetched data in order to solve problems of extra delay and additional cost of bandwidth due to the packet padding. We define formally the traffic classification problem assess the achievable performance bounds for a masking algorithm, by defining the optimal solution to the stated masking problem. We show with many numerical examples based on real traffic traces that fragmenting entails a marginal benefit as to overhead with respect to much simpler approaches where only padding is used.