OP4T: Bringing Advanced Network Packet Timestamping into the Field Mohammed Hawari *† , Thomas Clausen † * Cisco Systems, France mhawari@cisco.com † ´ Ecole Polytechnique, France {mohammed.hawari,thomas.clausen}@polytechnique.edu Abstract—Because it is very bursty, the microsecond-scale temporal behaviour of network traffic in data-centres is chal- lenging to measure and understand. To bring observability into data-centre networks, this paper introduces the Open Platform for Programmable Precise Packet Timestamping (OP4T), a hardware architecture, targeting Field-Programmable Gateway Arrays (FPGAs), integrated into data-centre servers as a Smart Network Interface Card (SmartNIC), and flexible enough to enable advanced latency diagnosis. In this paper, OP4T is specified, and an open-source im- plementation of that architecture is proposed, targeting the NetFPGA SUME prototyping board. By leveraging the P4 programming language, and partial reconfiguration, that open- source implementation is experimentally shown to enable in-band, precise packet timestamping, without sacrificing the achievable throughput. As an illustration, OP4T is shown to be usable to measure fine-grained properties of a software packet forwarder, e.g., packet batching. I. I NTRODUCTION The accurate measurement of latency is a key tool for qual- ifying the performance of networked systems. Previous work has shown that traffic patterns in data-centre networks include packet bursts [1], [2], observed as a heavy-tailed distribution of packet interarrival times [3]. Bursts are responsible for an increased buffer occupation in packet switches, eventually leading to queuing and, if buffers are undersized, packet drops. The corresponding additional delays, even when they are in the microsecond scale, are in turn responsible for observable, application-level, performance impairments [4], [2]. Moreover, packet bursts in data-centre networks are transient, appear at time-scales in the order of a few dozen micro-seconds [5], and are difficult to detect by coarse measurements. That is different from traffic patterns, occurring in wide-area networks, and observable by methods such as tomographic inference [6], which are derived from coarse metrics, To understand such transient network traffic patterns, and to diagnose transient latency spikes, instrumentation enabling accurate packet timestamping on selected flows is, therefore, crucial. Such instrumentation is already available as a part of network testers, i.e., systems capable of generating predefined traffic patterns, and monitoring the latency introduced by networked Devices Under Test (DUTs). Despite the prior existence of network testers, both as commercial hardware appliances — e.g., Ixia PerfectStorm or Spirent TestCenter, and as open source hardware designs — e.g., the Open Source Network Tester (OSNT) [7] or FlueNT10G [8], the cost, programmability and/or performance of those solutions are subject to limitations, described in this paper. More fundamentally, those network testers are only designed to be used during the qualification phase of a DUT, and not in situ, i.e., for understanding latency issues in a real deployment. This paper goes beyond network testers by introducing the Open Platform for Programmable Precise Packet Timestamping (OP4T). While network testers are external to a DUT, and are responsible both for generating traffic patterns and for monitoring a temporal response, OP4T exposes a deliberately different semantic; OP4T belongs to the category of Smart Network Interface Card (SmartNIC) and exposes the same services as a regular network interface. Used in a data-centre server in place of a commodity Network Interface Card (NIC), OP4T enables in-band packet timestamping, with a minimal disruption of normal application operations. The OP4T architecture is designed according to five guiding principles. 1) Openness Primarily targeted towards the research com- munity, OP4T must be compatible with an affordable network prototyping Field-Programmable Gate Array (FPGA) board and, as much as possible, must reuse existing open-source hardware designs. 2) Programmability OP4T must allow programmable packet timestamping and payload alteration, to enable selecting the packet flows to monitor, and, potentially, the ones to alter with in-band timestamps. As such programmability must be accessible to network operators, not necessarily specialised in field programmable logic design, OP4T must provide a programming abstraction adapted to packet parsing, matching, and alteration, i.e., equivalent to the one exposed by the P4 programming language [9]. 3) Precision Destined to diagnose transient latency issues at small timescales, OP4T must be able to perform timestamping with a precision in the order of the microsecond at worst. 4) Performance When used to replace a regular server NIC, OP4T must not introduce any performance limitation in terms of achievable throughput or packet rate. 5) Flexibility Like most debugging, understanding transient latency spikes in a data-centre can be a complex, and