IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 62, NO. 6, JUNE 2016 3053 BASIC Codes: Low-Complexity Regenerating Codes for Distributed Storage Systems Hanxu Hou, Kenneth W. Shum, Senior Member, IEEE , Minghua Chen, Senior Member, IEEE , and Hui Li Abstract— In distributed storage systems, regenerating codes can achieve the optimal tradeoff between storage capacity and repair bandwidth. However, a critical drawback of existing regenerating codes, in general, is the high coding and repair com- plexity, since the coding and repair processes involve expensive multiplication operations in finite field. In this paper, we present a design framework of regenerating codes, which employ binary addition and bitwise cyclic shift as the elemental operations, named BASIC regenerating codes. The proposed BASIC regen- erating codes can be regarded as a concatenated code with the outer code being a binary parity-check code, and the inner code being a regenerating code utilizing the binary parity-check code as the alphabet. We show that the proposed functional-repair BASIC regenerating codes can achieve the fundamental tradeoff curve between the storage and repair bandwidth asymptotically of functional-repair regenerating codes with less computational complexity. Furthermore, we demonstrate that the existing exact- repair product-matrix construction of regenerating codes can be modified to exact-repair BASIC product-matrix regenerating codes with much less encoding, repair, and decoding complexity from the theoretical analysis, and with less encoding time, repair time, and decoding time from the implementation results. Index Terms— Regenerating codes, distributed storage systems, low complexity, binary parity-check code. I. I NTRODUCTION D ISTRIBUTED storage systems achieve high reliability by storing the data redundantly in many connected Manuscript received April 3, 2015; revised December 20, 2015; accepted February 28, 2016. Date of publication April 13, 2016; date of current version May 18, 2016. This work was supported in part by the National Basic Research Program of China under Grant 2012CB315904 and Grant 2013CB336700, in part by the Natural Fund of Guangdong Province under Grant S2013020012822, in part by the Shenzhen Basic Research Project under Grant SZJCYJ20150331100723974 and Grant SZJCYJ20140417144423192, and in part by the University Grants Committee, Hong Kong, through the Area of Excellence Grant Project under Grant AoE/E-02/08, Grant 14209115, and Grant CUHK14209515. This paper was presented at the 2014 IEEE International Symposium on Information Theory [1]. (Corresponding author: Hui Li.) H. Hou is with the Shenzhen Key Lab of Information theory and Future Internet architecture, School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen 518055, China, and the Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong (e-mail: houhanxu@163.com). K. W. Shum is with the Institute of Network Coding, The Chinese University of Hong Kong, Hong Kong (e-mail: wkshum@inc.cuhk.edu.hk). M. Chen is with the Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong (e-mail: minghua@ie.cuhk.edu.hk). H. Li is with the Shenzhen Key Lab of Information theory and Future Inter- net architecture, Shenzhen Engineering Laboratory of Converged Networks, School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen 518055, China (e-mail: lih64@pkusz.edu.cn). Communicated by A. G. Dimakis, Associate Editor for Coding Techniques. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIT.2016.2553670 unreliable storage nodes. Maximum-distance-separable (MDS) codes such as Reed-Solomon (RS) codes is one common approach to provide redundancy. With an (n, k ) RS code, a data file is encoded and stored across n nodes such that a data collector can retrieve the original data file from any k nodes. Upon the failure of a node, we need to regenerate the data stored in the failed node in order to maintain the same level of reliability. Dimakis et al. in [2] formulated the repair problem and proposed the regenerating codes (RGC) with the aim of efficient repair of the failed node. In the pioneer work in [2], a data file with B symbols over the finite field F 2 w is encoded into nα symbols and distributed to n nodes, with each node storing α symbols such that the original data file can be recovered from any k nodes. The requirement that any k nodes are sufficient in decoding the original data file is called the (n, k ) recovery property. When a node fails, we replace it by a new node and regenerate the data in the new node by downloading β symbols from each of d surviving nodes. The storage nodes which participate in the repair process are called the helpers. The total number of symbols downloaded from the helpers is coined the repair bandwidth in [2]. There are two main versions of repair: exact repair and functional repair. In exact repair, the symbols stored in the failed node are exactly reproduced in the new node. In functional repair, the require- ment is relaxed; the new node may contain different symbols from that in the failed node as long as the repaired system maintains the (n, k ) recovery property. We will refer to an encoding scheme satisfying (i) the (n, k ) recovery property, and (ii) any node failure can be repaired by contacting d nodes, as an (n, k , d )-distributed storage system (DSS). The minimization of repair bandwidth for functional repair is closely related to the single-source multi-cast problem in network coding theory. It is shown in [2] that for an (n, k , d )-DSS, the file size B , the capacity of a storage node α, and the repair bandwidth per helper node β , satisfy the following inequality, B k i =1 min{(d - i + 1)β,α}. (1) If we fix the file size B , we have a tradeoff between storage α and repair bandwidth β . The two extreme points in this tradeoff are termed the minimum storage regeneration (MSR) and minimum bandwidth regeneration (MBR) points respec- 0018-9448 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.