TBM: Twin Block Management Policy to Enhance the Utilization of Plane-Level Parallelism in SSDs Arash Tavakkol, Pooyan Mehrvarzy, and Hamid Sarbazi-Azad Abstract—The internal architecture of a SSD provides channel-, chip-, die- and plane-level parallelism levels, to concurrently perform multiple data accesses and compensate for the performance gap between a single flash chip and host interface. Although a good striping strategy can effectively exploit the first three levels, parallel I/O accesses at plane-level can be performed only for operations of the same types and page addresses. In this work, we propose the Twin Block Management (TBM) policy that symmetrically conducts usage and recycling of the flash block addresses on the planes of a die, thus enhancing the utilization of plane-level parallelism for reads, writes and erases. Evaluation results show that TBM improves IOPS and response time by up to 73 and 42 percent, respectively. Index Terms—Flash memory, solid-state drive, plane-level parallelism, garbage collection Ç 1 INTRODUCTION IN NAND flash solid state drives (SSDs), in spite of the availability of high-performance NAND flash communication interfaces (up to 800 MB [1]), the maximum achievable I/O performance of a single flash chip (package) is restricted by the execution latency of flash operations. In particular, the flash write (program) operation is very slow and it even becomes slower by VLSI technology shrink- ing. Consequently, SSDs use a set of flash chips which are orga- nized in a massively parallel architecture (see Fig. 1) to achieve high I/O performance through simultaneous execution of multiple flash operations. In this architecture, a multi-channel multi-way bus structure facilitates concurrent accesses to flash chips. Each flash chip itself is composed of a set of dies that share the chip com- munication interface and can independently execute flash opera- tions. At the lowest level, there are multiple planes within a die that can operate in parallel. However, plane-level parallelism has a strict restriction that must be adhered to, i.e. same operations on the same flash memory addresses are required for simultaneous execution on the planes of a die. Many recent studies were proposed to exploit plane-level paral- lelism more effectively and alleviate its inherent limitations. They generally use reordering and rescheduling of queued I/O opera- tions to increase the chance of parallel execution at plane level [3], [5], [8]. However, the efficiency of these methods is highly sensitive to the behavior of the SSD flash management policy. Strictly speak- ing, an out-of-place update policy is used in SSD to reduce the nega- tive impacts of the NAND flash erase-before-write property. To perform an update using this policy, the previous version of data is marked invalid and the new data is written into a free location. Therefore, a logical-to-physical address mapping scheme is used in conjunction with a garbage collection (GC) mechanism to manage data placement, consumption of the free memory locations, and recycling of the invalid locations. These mechanisms greatly impact the existence of queued I/O operations mapped onto different planes of a die and, at the same time, accessing identical addresses inside these planes. For instance, less queued write operations con- form to the plane-level addressing constraint, if memory addresses of the neighboring planes are asymmetrically assigned and invali- dated memory locations are recycled without any address consider- ation. However, this critical influence of flash management mechanisms was not considered in previous proposals and hence their performance gain becomes negligible when random use of page addresses becomes more frequent in the long-term. In this work, we propose the Twin Block Management (TBM) policy for out- of-place update that is aware of plane-level addressing constraint. TBM defines new strategies for physical address assignment and GC execution in order to symmetrically conduct usage and recycling of the memory addresses on the planes of a die. 2 SSD INTERNALS Fig. 1 shows the internal architecture of a NAND flash SSD com- posed of four main components: 1) A host interface, that provides communication with the host system and performs I/O request queuing; 2) A controller, composed of a microprocessor and a DRAM memory, that executes a special management firmware called Flash Translation Layer (FTL); 3) A flash controller that is a hard- ware driver to enable communication with flash chips; 4) Flash chips that provide the raw SSD storage capacity. As we mentioned previ- ously, flash chips are organized in a hierarchy of four parallelism lev- els i.e. channel-, chip-, die- and plane-level. Multiple flash I/O operations can be simultaneously executed through striping over communication channels and pipelining between the set of flash chips connected to each channel. Furthermore, each die of a flash chip has its own command and address registers and hence execu- tion of different operations can be interleaved between dies. At the lowest level, planes of a die share the same control logic. Therefore, same operations with same addresses must be available in the I/O queue for parallel plane (multi-plane) command execution. Access to the internal storage space of a plane is serial and read/write opera- tions are performed at the unit of a page which typically includes 4 KB, 8 KB or a larger volume of data. Updating the content of a previ- ously written flash page requires an erase operation. Due to its very slow execution, a flash erase is performed at the unit of a block com- posed of a set of 128, 256 or more pages. According to ONFI standard specification [1], multi-plane commands must access the same page offset within target blocks (referred as PAC). Besides, the flash con- trol logic may also require the block addresses to be identical for multi-plane command execution (referred as BAC). As of 2015, all flash products require PAC but BAC is relaxed in many products. In order to emulate the interface of a conventional block device (HDD), FTL performs following tasks: 1) Management of the queued I/O requests and address mapping: host I/O requests are segmented into several page-size transactions each with a specific logical page address (LPA). Due to out-of-place update, LPAs must be translated into physical page addresses (PPAs). The translation procedure follows two different paths for write and read operations. For a write operation, FTL allocates a free physical page. First, a plane allocation function (PLAlloc) determines the address of target channel, flash chip, die and plane according to a predefined alloca- tion strategy. Then, a block allocation function (BLAlloc) assigns a write-frontier within the selected plane. Inside write-frontier, pages are allocated sequentially, from first to last index, and thus PPA is determined and the (LPA, PPA) pair is stored in a mapping table for future reads. For read operations, translation is performed by searching the mapping table for LPA entry. 2) GC: the out-of-place update policy quickly consumes free flash pages. Consequently, a GC procedure must be triggered by FTL to recycle the physical pages having invalid data. This procedure selects a victim block based on a predetermined policy, moves its valid pages into a new location, and finally triggers the execution of erase operation. A. Tavakkol is with the HPCAN Lab, Computer Engineering Department, Sharif University of Technology, Tehran, Iran. E-mail: tavakkol@ce.sharif.edu. P. Mehrvarzy is with the School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran. E-mail: p.mehrvarzy@ipm.ir. H. Sarbazi-Azad is with the HPCAN Lab, Computer Engineering Department, Sharif University of Technology, and the School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran. E-mail: azad@ipm.ir. Manuscript received 8 Feb. 2015; revised 2 July 2015; accepted 13 July 2015. Date of pub- lication 26 July 2015; date of current version 5 Jan. 2017. For information on obtaining reprints of this article, please send e-mail to: reprints@ieee. org, and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/LCA.2015.2461162 IEEE COMPUTER ARCHITECTURE LETTERS, VOL. 15, NO. 2, JULY-DECEMBER 2016 121 1556-6056 ß 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.