25
Countering Fragmentation in an Enterprise Storage System
RAM KESAVAN, Google (work done while employed at NetApp, Inc.), USA
MATTHEW CURTIS-MAURY, VINAY DEVADAS, and KESARI MISHRA, NetApp, Inc, USA
As a fle system ages, it can experience multiple forms of fragmentation. Fragmentation of the free space in
the fle system can lower write performance and subsequent read performance. Client operations as well as
internal operations, such as deduplication, can fragment the layout of an individual fle, which also impacts
fle read performance. File systems that allow sub-block granular addressing can gather intra-block fragmen-
tation, which leads to wasted free space. Similarly, wasted space can also occur when a fle system writes a
collection of blocks out to object storage as a single large object, because the constituent blocks can become
free at diferent times. The impact of fragmentation also depends on the underlying storage media. This arti-
cle studies each form of fragmentation in the NetApp
®
WAFL
®
fle system, and explains how the fle system
leverages a storage virtualization layer for defragmentation techniques that physically relocate blocks ef-
ciently, including those in read-only snapshots. The article analyzes the efectiveness of these techniques at
reducing fragmentation and improving overall performance across various storage media.
CCS Concepts: • Information systems → Hierarchical storage management; Information storage tech-
nologies; Storage virtualization;• Software and its engineering → File systems management;
Additional Key Words and Phrases: Storage system, fle system, fragmentation, fle system performance, snap-
shot, deduplication
ACM Reference format:
Ram Kesavan, Matthew Curtis-Maury, Vinay Devadas, and Kesari Mishra. 2020. Countering Fragmentation
in an Enterprise Storage System. ACM Trans. Storage 15, 4, Article 25 (January 2020), 35 pages.
https://doi.org/10.1145/3366173
1 INTRODUCTION
File systems typically allocate physically contiguous blocks in storage devices to write out logi-
cally sequential data and metadata. This strategy maximally uses the write bandwidth available
from each storage device, since more blocks can be written to it using fewer write I/Os, and it
allows optimal performance when those data or metadata are later read sequentially. Common fle
An earlier version of this article [18] appeared in the proceedings of the File and Storage Technologies Conference (FAST’19)
in Boston, MA, 2019.
Authors’ addresses: R. Kesavan, Google (work done while employed at NetApp, Inc.) 1600 Amphitheatre Pkwy, Moun-
tain View, CA 94043; email: ram.kesavan@gmail.com; M. Curtis-Maury and V. Devadas, NetApp, 7301 Kit Creek Rd,
Durham, NC, 27709; emails: {mcm, vdevadas}@netapp.com; K. Mishra, 1223 Crescent Terrace, Sunny vale, CA, 94087; email:
km@netapp.com.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and
the full citation on the frst page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specifc permission and/or a fee. Request permissions from permissions@acm.org.
© 2020 Association for Computing Machinery.
1553-3077/2020/01-ART25 $15.00
https://doi.org/10.1145/3366173
ACM Transactions on Storage, Vol. 15, No. 4, Article 25. Publication date: January 2020.