Kinetic Action: Performance Analysis of Integrated Key-Value Storage Devices vs. LevelDB Servers Manas Minglani, Jim Diehl, Xiang Cao , Bingzhe Li, Dongchul Park , David J. Lilja, and David H.C. Du University of Minnesota - Twin Cities, Minneapolis, Minnesota, USA {mingl001, jdiehl, lixx1743, lilja, du}@umn.edu School of Computing and Information Systems, Grand Valley State University, Allendale, Michigan, USA caox@gvsu.edu Computer & Electronic Systems Engineering, Hankuk University of Foreign Studies, South Korea dpark@hufs.ac.kr Abstract—With the rise of cloud storage and many data intensive applications, there is an unprecedented growth in the volume of unstructured data. In response, key-value object storage is becoming more popular for the ease with which it can store, manage, and retrieve large amounts of this data. Seagate recently launched Kinetic direct-access-over-Ethernet hard drives which incorporate a LevelDB key-value store inside each drive. In this work, we evaluate these drives using micro as well as macro benchmarks to help understand the performance limits, trade-offs, and implications of replacing traditional hard drives with Kinetic drives in data centers and high performance systems. We perform in-depth throughput and latency benchmarking of these Kinetic drives (each acting as a tiny independent server) from a client machine connected to them via Ethernet. We compare these results to a SATA-based and a faster SAS-based traditional server running LevelDB. Our sample Kinetic drives are CPU-bound, but they still average sequential write throughput of 63 MB/sec and sequential read throughput of 78 MB/sec for 1 MB value sizes. They also demonstrate unique Kinetic features including direct disk-to-disk data transfer. Our macro benchmarking using the Yahoo Cloud Serving Benchmark (YCSB) shows that mid-range LevelDB servers outperform the Kinetic drives for several workloads; however, this is not always the case. For larger value sizes, even these first generation sample Kinetic drives outperform a full server for several different workloads. Keywords— Performance Evaluation, Data Center Storage Architecture, Key-Value Store, Cloud Applications I. I NTRODUCTION The amount of digital data is growing at an extremely rapid pace, and it is estimated that its volume will grow at 40%-50% per year [1]. Most of this data explosion is due to unstructured data. According to predictions from International Data Corporation (IDC), 80% of the 133 exabytes of global data growth in 2017 will be unstructured [2]. Managing such an enormous amount of data is a challenging task. Most of the data in existing systems is stored and accessed using traditional file-based or block-based systems [3]. Un- fortunately, these traditional systems are becoming inefficient as file-based access and hardware requirements limit their scalability. Therefore, there is a need for a data access method that is flexible and capable of horizontal scale-out [4]. Object storage overcomes limitations of file-based systems by offering scalability and being software defined [5]. The data is communicated as objects rather than files or blocks, and additional metadata can be stored alongside the object [3]. Object storage is flat structured and prevents higher-level applications from needing to manipulate data at the lowest level. Therefore, object storage has the flexibility to scale-out horizontally. When a unique identifier, called a "key," is used to access the object, or "value," this form of storage can be called a key-value store (KV store). There are several key-value stores being deployed to support large websites such as Dynamo at Amazon [6], Redis at GitHub [7], and RocksDB at Facebook [8]. All these systems store ordered <key, value> pairs. Even though key-value stores address the problem of scaling and managing huge amounts of data for the above systems, the existing key-value stores also have several limitations. They run on top of multiple layers of legacy software and hardware, such as POSIX, RAID controllers, etc., designed for file-based systems [5]. Also, these huge systems consume significant amounts of power and rack space [9]. To help overcome these hardware and software limitations, Seagate has announced a new class of hard drives called Kinetic drives [5]. These drives have a built-in processor that runs a LevelDB-based key-value store directly on the drive [10]. Rather than the typical SATA or SAS interface, Kinetic drives communicate externally via TCP/IP over Ethernet. Each drive acts as a tiny server in itself. An important function of the drives is direct P2P (Peer-to-Peer) transfer that allows direct data transfer from one drive to another via Ethernet without the need to copy data through a storage controller or other server [11]. By replacing hardware and software layers, Kinetic drives can reduce the cost and complexity of a large- scale object storage system. In this work, we seek to better understand the key function- alities, features, and performance of the drives. We use this in- formation to compare Kinetic drives with other LevelDB-based servers and to derive insights about the possibility of replacing traditional hard drives with Kinetic drives. Furthermore, the specification sheets [12] do not provide a detailed analysis of the throughput, latency, and other features, especially in comparison to the other LevelDB-based servers. We also study the ease of programmability of the Kinetic drives and share our experiences. To the best of our knowledge there are no prior works which evaluate Kinetic drives this thoroughly. To understand several salient features including P2P transfer, "Get," "Put," and others, we develop several tests 501 2017 IEEE 23rd International Conference on Parallel and Distributed Systems 978-1-5386-2129-5/17/31.00 ©2017 IEEE DOI 10.1109/ICPADS.2017.00072