D-Zipfian: A Decentralized Implementation of Zipfian Sumita Barahmand and Shahram Ghandeharizadeh Database Laboratory Technical Report 2012-04 Computer Science Department, USC Los Angeles, California 90089-0781 December 8, 2012 Abstract Zipfian distribution is used extensively to generate workloads to evaluate systems. With scalable multi-node database management systems, a centralized single node benchmarking framework that em- bodies Zipfian may utilize its resources fully and not be able to generate work at a sufficiently high rate to evaluate its target system. This means the benchmarking framework must become decentralized and scalable. BG is one such framework. This paper presents BG’s decentralized, parallel implementation of Zipfian named D-Zipfian. D-Zipfian employs multiple nodes that reference data items independently. This scalable technique strives to produce a distribution that is independent of its degree of parallelism, i.e., number of employed nodes. Moreover, it supports heterogeneous nodes that reference data items at different rates. We characterize the behavior of D-Zipfian with different degrees of parallelism and skewness, population sizes, and heterogeneity of its employed nodes. A Introduction Benchmarking frameworks strive to model reality as close as possible. A random distribution of access to data items is typically not realistic due to Zipf’s law [17]. This law states that given some collection of data items, the frequency of any data item is inversely proportional to its rank in its frequency table. This means the most frequently referenced data item will occur more often than the second most frequent data item, the second most frequent data item will occur more often than the third most frequent data item, so on and so forth. By manipulating the exponent 1 that characterizes the Zipfian distribution one may emulate different rules of thumb such as: 80% of requests (ticket sales [5], frequency of words [17], profile look-ups) reference 20% of data items (movies opening on a weekend, words uttered in natural language, members of a social networking site). 1 See Equation 1 in Section B. 1