. RESEARCH PAPERS . SCIENCE CHINA Information Sciences March 2010 Vol. 53 No. 3: 659–676 doi: 10.1007/s11432-010-0053-5 c Science China Press and Springer-Verlag Berlin Heidelberg 2010 info.scichina.com www.springerlink.com Identifying heavy hitters in high-speed network monitoring ZHANG Yu 1 , FANG BinXing 1,2 & ZHANG YongZheng 2 1 Research Center of Computer Network and Information Security Technology, Harbin Institute of Technology, Harbin 150001, China; 2 Research Center of Information Security, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China Received July 15, 2009; accepted December 19, 2009 Abstract Identifying heavy hitters in a network traffic stream is important for a variety of network applications ranging from traffic engineering to anomaly detection such as detection of denial-of-service attacks. Existing methods generally examine newly arriving items in the stream, perform a small number of operations using a small amount of memory, and still provide guarantees on the identifying accuracy. In high-speed network monitoring, the update speed per item is extremely critical. However, so far as we know, there are no identifying algorithms which can provide constant update time (O(1)) in a weighted data stream. In this paper, we present an algorithm named Weighted Lossy Counting (WLC) which is able to identify heavy hitters in a high-speed weighted data stream with constant update time. WLC employs a novel efficient partially ordered data structure which is able to provide a fast per-item update speed while keeping the memory cost relatively low. We compare WLC with state-of-the-art algorithms for finding heavy hitters in real traffic traces. The experimental results show that WLC performs well in accuracy (recall, precision and average relative error) as other algorithms; moreover it has a much higher update speed at the cost of relatively larger memory space used. A theoretical worst-case memory bound of WLC is also derived in this paper; however, experiments with long real traffic traces show that WLC requires much less space than the theoretical bound in practice. Keywords network traffic monitoring, heavy hitter, weighted data streams Citation Zhang Y, Fang B X, Zhang Y Z. Identifying heavy hitters in high-speed network monitoring. Sci China Inf Sci, 2010, 53: 659–676, doi: 10.1007/s11432-010-0053-5 1 Introduction To accurately measure and monitor network traffic is the basis of managing large-scale networks. In the traffic measurement, flow level approaches can provide a reasonable tradeoff between the volume of infor- mation and its level of detail [1]. And many studies [2–5] have shown that flow statistics exhibits strong heavy-tailed behaviors in various networks: a small percentage of flows account for a large percentage of traffic, e.g., Fang and Peterson [6] indicate that 9% of the flows between autonomous system (AS) pairs account for 90% of the byte traffic between all AS pairs. This characteristic is often referred to as the elephant (i.e., heavy hitter) and mice phenomenon. Identifying flows which are responsible for most bytes Corresponding author (email: yuzhanghit@gmail.com)