PREPRINT Friday 3 rd November, 2017, 15:15 A Single-FPGA Architecture for Detecting Heavy Hitters in 100 Gbit/s Ethernet links Jose Fernando Zazo * , Sergio Lopez-Buedo *† , Mario Ruiz † , Gustavo Sutter † * NAUDIT HPCN Calle Faraday 7, 28049 Madrid, Spain † High-Performance Computing and Networking Research Group, Universidad Autonoma de Madrid Ciudad Universitaria de Cantoblanco, 28049 Madrid, Spain Abstract—In network trafﬁc monitoring, a very important analysis is to ﬁnd heavy hitters. That is, ﬁnding those ﬂows that use most resources in a given network link. This information can be very useful for security or trafﬁc management purposes. Though this analysis might seem easy to implement, since it is essentially based on counting, the fact is that doing it at 100 Gbit/s rates is far from trivial. In 100 Gbit/s Ethernet (100 GbE), up to 148 million packets per second can be received, thus making it very difﬁcult to parse packets and maintain counters at such rate. In this paper, we leverage the integrated 100G Ethernet Subsystem available in Xilinx UltraScale devices to implement a heavy hitter detector for 100 GbE in a VCU108 evaluation kit. Thanks to the integration of the Count Sketch algorithm with a priority list and a network packet parser, the proposed architecture is able to work at line rate for average packet sizes bigger than 215 bytes. The work presents a theoretical analysis of the error, as well as the technical details of the proposed solution. The implementation has been validated using real-world traces, obtaining an average error of 1.29%. I. I NTRODUCTION Counting elements is an everyday activity. It is the basic brick for generating statistics, monitoring physical phenomena or studying problems in a diverse range of ﬁelds. Computer networks are no exception. Commercial switches and routers use counters to provide aggregated information about the num- ber of packets, the number of connected users or the bandwidth usage. Such simple metrics are however very important to understand the networking infrastructure and ensure its proper functioning. Solutions for counting that involve several heuristics are not unusual. Sampling and hashing are two well-known candidates in a race for achieving a trade-off between accuracy and resource consumption. Static structures, in terms of mem- ory usage, are always beneﬁcial for hardware developments. Sketches feature such static structures and belong to the the group of the hashing techniques. Initially proposed in 1995 [1], they are an elegant alternative composed by a bidimensional array of d rows and w columns, which are deﬁned during the design of the structure. Drawbacks arise when our concerns become somewhat more exigent, when we move from a mere counter to item classiﬁcation. “Which ones are the most inﬂuential?” or, “what are the current trends?” are the kind of questions that this paper aims to solve. The goal, from a network analytics perspective, is to determine users that generate the most of the trafﬁc or ﬁnding out which servers are more heavily loaded. However, determining the most frequent elements in a dataset is not a straightforward task and it is even a more difﬁcult when working at line rate in 100 GbE links. At such speed, items cannot be inspected more than once, and the packet rate of a fully loaded link can reach 148 million packets per second. Network analytics is not the only vibrant topic where determining the number of occurrences of an item is a key aspect. Natural language processing or webpage indexing by a search engine are other instances where the huge volume of information cannot be counted at runtime with a limited mem- ory consumption, unless a fair equilibrium between accuracy and resource utilization is established. M. Charikar [2] suggested a software approach when count- ing the most frequent elements: integrate a sketch structure in combination with a ﬁxed-size heap. For the purpose of this work, tree-based structures, though they are optimal for software implementations, may not be so alluring for a hardware design. Indeed, implementing a priority queue (PQ) is a demanding task itself [3]. Fortunately, the size of the ranking list is generally in the order of tens of elements, so certain simpliﬁcations to the design ﬁt in the scope of this application. The reduction in memory requirements achieved by limiting the size of the ranking list makes the design ideal for FPGA implementation. The Count Sketch (CS) algorithm as well as a PQ can ﬁt on BRAM memories, thus obtaining a high speed up because of the tangibly diminution of the latency when compared with conventional memories. Additionally, dedicated hardware can fully exploit the parallelism of the algorithms (multiple computations of hash functions or si- multaneous comparisons to look for the greatest values). These two facts allowed us to scale up the framework to the demanding scenario of 100 GbE network analytics. II. RELATED WORK The action of ﬁnding the top-n elements is solvable by counter-based techniques. Using a particular counter for ev- ery possible candidate in the data stream, and sorting the different counters by value, is an efﬁcient solution when the requirements of the problem are well deﬁned. One of the main drawbacks of this approximation is that the total number of counters increases linearly with the number of monitored