USENIX Association 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’13) 385 Scaling Memcache at Facebook Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, Venkateshwaran Venkataramani {rajeshn,hans}@fb.com, {sgrimm, marc}@facebook.com, {herman, hcli, rm, mpal, dpeek, ps, dstaff, ttung, veeve}@fb.com Facebook Inc. Abstract: Memcached is a well known, simple, in- memory caching solution. This paper describes how Facebook leverages memcached as a building block to construct and scale a distributed key-value store that supports the world’s largest social network. Our system handles billions of requests per second and holds tril- lions of items to deliver a rich experience for over a bil- lion users around the world. 1 Introduction Popular and engaging social networking sites present signiﬁcant infrastructure challenges. Hundreds of mil- lions of people use these networks every day and im- pose computational, network, and I/O demands that tra- ditional web architectures struggle to satisfy. A social network’s infrastructure needs to (1) allow near real- time communication, (2) aggregate content on-the-ﬂy from multiple sources, (3) be able to access and update very popular shared content, and (4) scale to process millions of user requests per second. We describe how we improved the open source ver- sion of memcached [14] and used it as a building block to construct a distributed key-value store for the largest so- cial network in the world. We discuss our journey scal- ing from a single cluster of servers to multiple geograph- ically distributed clusters. To the best of our knowledge, this system is the largest memcached installation in the world, processing over a billion requests per second and storing trillions of items. This paper is the latest in a series of works that have recognized the ﬂexibility and utility of distributed key- value stores [1, 2, 5, 6, 12, 14, 34, 36]. This paper fo- cuses on memcached—an open-source implementation of an in-memory hash table—as it provides low latency access to a shared storage pool at low cost. These quali- ties enable us to build data-intensive features that would otherwise be impractical. For example, a feature that issues hundreds of database queries per page request would likely never leave the prototype stage because it would be too slow and expensive. In our application, however, web pages routinely fetch thousands of key- value pairs from memcached servers. One of our goals is to present the important themes that emerge at different scales of our deployment. While qualities like performance, efﬁciency, fault-tolerance, and consistency are important at all scales, our experi- ence indicates that at speciﬁc sizes some qualities re- quire more effort to achieve than others. For exam- ple, maintaining data consistency can be easier at small scales if replication is minimal compared to larger ones where replication is often necessary. Additionally, the importance of ﬁnding an optimal communication sched- ule increases as the number of servers increase and net- working becomes the bottleneck. This paper includes four main contributions: (1) We describe the evolution of Facebook’s memcached- based architecture. (2) We identify enhancements to memcached that improve performance and increase memory efﬁciency. (3) We highlight mechanisms that improve our ability to operate our system at scale. (4) We characterize the production workloads imposed on our system. 2 Overview The following properties greatly inﬂuence our design. First, users consume an order of magnitude more con- tent than they create. This behavior results in a workload dominated by fetching data and suggests that caching can have signiﬁcant advantages. Second, our read op- erations fetch data from a variety of sources such as MySQL databases, HDFS installations, and backend services. This heterogeneity requires a ﬂexible caching strategy able to store data from disparate sources. Memcached provides a simple set of operations (set, get, and delete) that makes it attractive as an elemen- tal component in a large-scale distributed system. The open-source version we started with provides a single- machine in-memory hash table. In this paper, we discuss how we took this basic building block, made it more ef- ﬁcient, and used it to build a distributed key-value store that can process billions of requests per second. Hence-