Cache Behavior of Network Protocols Erich Nahum, David Yates, Jim Kurose, and Don Towsley Department of Computer Science University of Massachusetts Amherst, MA 01003 nahum,yates,kurose,towsley @cs.umass.edu Abstract In this paper we present a performance study of memory reference behavior in network protocol processing, using an Internet-based protocol stack implemented in the x-kernel running in user space on a MIPS R4400-based Silicon Graphics machine. We use the pro- tocols to drive a validated execution-driven architectural simulator of our machine. We characterize the behavior of network protocol processing, deriving statistics such as cache miss rates and per- centage of time spent waiting for memory. We also determine how sensitive protocol processing is to the architectural environment, varying factors such as cache size and associativity, and predict performance on future machines. We show that network protocol cache behavior varies widely, with miss rates ranging from 0 to 28 percent, depending on the scenario. We find instruction cache behavior has the greatest effect on protocol latency under most cases, and that cold cache behavior is very different from warm cache behavior. We demonstrate the upper bounds on performance that can be expected by improving memory behavior, and the impact of features such as associativity and larger cache sizes. In particular, we find that TCP is more sensitive to cache behavior than UDP, gaining larger benefits from improved associativity and bigger caches. We predict that network protocols will scale well with CPU speeds in the future. 1 Introduction Cache behavior is a central issue in contemporary computer system performance. The large gap between CPU and memory speeds is well-known, and is expected to continue for the forseeable future [17]. Cache memories are used to bridge this gap, and multiple lev- els of cache memories are typical in contemporary systems. Many This research supported in part by NSF under grant NCR-9206908, and by ARPA under contract F19628-92-C-0089. Erich Nahum was supported by a Computer Mea- surement Group Fellowshipand is currently with the IBM T.J. Watson Research Center. David Yates was the recipient of a Motorola Codex University Partnership in Research Grant and is currently with the Boston University Computer Science Department. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantageand that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish,to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGMETRICS ’97, Seattle, Washington, USA c 1997 ACM ..$3.50 studies have examined the memory reference behavior of appli- cation code, and recently work has appeared studying the cache behavior of operating systems. However, little work has been done to date exploring the impact of memory reference behavior on net- work protocols. As networks become ubiquitous, it is important to understand the interaction of network protocol software and com- puter hardware. Thus, rather than examining an application suite such as the SPEC 95 benchmarks, the workload that we study is network protocol software. We wish to address the following research issues: What is the memory reference behavior of network protocol code? What are the cache hit rates? How much time is spent waiting for memory? Which has a more significant impact on performance, instruc- tion references or data references? How sensitive are network protocols to the cache organiza- tion? How do factors such as cache size and associativity affect performance? What kind of impact will future architectural trends have on network protocol performance? We use execution-driven simulation to answer these questions, by using an actual network protocol implementation that we run both on a real system and on a simulator. We have constructed a simulator for our MIPS R4400-based Silicon Graphics machines, and taken great effort to validate our simulator, i.e., to ensure that it models the performance costs of our platform accurately. We use the simulator to analyze a suite of Internet-based protocol stacks implemented in the x-kernel [20], which we ported to user space on our SGI machine. We characterize the behavior of network protocol processing, deriving statistics such as cache miss rates, instruction use, and percentage of time spent waiting for memory. We also determine how sensitive protocol processing is to the architectural environment, varying factors such as cache size and associativity, and we predict performance on future machines. We show that network protocol software is very sensitive to cache behavior, and quantify this sensitivity in terms of perfor- mance under various conditions. We find that protocol memory reference behavior varies widely, and that instruction cache behav- ior has the greatest effect on protocol latency in most cases. We present the upper bounds on performance improvements that can 1