IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 62, NO. 6, JUNE 2016 3053
BASIC Codes: Low-Complexity Regenerating
Codes for Distributed Storage Systems
Hanxu Hou, Kenneth W. Shum, Senior Member, IEEE , Minghua Chen, Senior Member, IEEE , and Hui Li
Abstract— In distributed storage systems, regenerating codes
can achieve the optimal tradeoff between storage capacity and
repair bandwidth. However, a critical drawback of existing
regenerating codes, in general, is the high coding and repair com-
plexity, since the coding and repair processes involve expensive
multiplication operations in finite field. In this paper, we present
a design framework of regenerating codes, which employ binary
addition and bitwise cyclic shift as the elemental operations,
named BASIC regenerating codes. The proposed BASIC regen-
erating codes can be regarded as a concatenated code with the
outer code being a binary parity-check code, and the inner code
being a regenerating code utilizing the binary parity-check code
as the alphabet. We show that the proposed functional-repair
BASIC regenerating codes can achieve the fundamental tradeoff
curve between the storage and repair bandwidth asymptotically
of functional-repair regenerating codes with less computational
complexity. Furthermore, we demonstrate that the existing exact-
repair product-matrix construction of regenerating codes can
be modified to exact-repair BASIC product-matrix regenerating
codes with much less encoding, repair, and decoding complexity
from the theoretical analysis, and with less encoding time, repair
time, and decoding time from the implementation results.
Index Terms— Regenerating codes, distributed storage systems,
low complexity, binary parity-check code.
I. I NTRODUCTION
D
ISTRIBUTED storage systems achieve high reliability
by storing the data redundantly in many connected
Manuscript received April 3, 2015; revised December 20, 2015; accepted
February 28, 2016. Date of publication April 13, 2016; date of current
version May 18, 2016. This work was supported in part by the National
Basic Research Program of China under Grant 2012CB315904 and Grant
2013CB336700, in part by the Natural Fund of Guangdong Province under
Grant S2013020012822, in part by the Shenzhen Basic Research Project under
Grant SZJCYJ20150331100723974 and Grant SZJCYJ20140417144423192,
and in part by the University Grants Committee, Hong Kong, through the
Area of Excellence Grant Project under Grant AoE/E-02/08, Grant 14209115,
and Grant CUHK14209515. This paper was presented at the 2014 IEEE
International Symposium on Information Theory [1]. (Corresponding author:
Hui Li.)
H. Hou is with the Shenzhen Key Lab of Information theory and Future
Internet architecture, School of Electronic and Computer Engineering, Peking
University Shenzhen Graduate School, Shenzhen 518055, China, and the
Department of Information Engineering, The Chinese University of Hong
Kong, Hong Kong (e-mail: houhanxu@163.com).
K. W. Shum is with the Institute of Network Coding, The Chinese University
of Hong Kong, Hong Kong (e-mail: wkshum@inc.cuhk.edu.hk).
M. Chen is with the Department of Information Engineering, The Chinese
University of Hong Kong, Hong Kong (e-mail: minghua@ie.cuhk.edu.hk).
H. Li is with the Shenzhen Key Lab of Information theory and Future Inter-
net architecture, Shenzhen Engineering Laboratory of Converged Networks,
School of Electronic and Computer Engineering, Peking University Shenzhen
Graduate School, Shenzhen 518055, China (e-mail: lih64@pkusz.edu.cn).
Communicated by A. G. Dimakis, Associate Editor for Coding Techniques.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIT.2016.2553670
unreliable storage nodes. Maximum-distance-separable (MDS)
codes such as Reed-Solomon (RS) codes is one common
approach to provide redundancy. With an (n, k ) RS code,
a data file is encoded and stored across n nodes such that
a data collector can retrieve the original data file from any
k nodes.
Upon the failure of a node, we need to regenerate the data
stored in the failed node in order to maintain the same level of
reliability. Dimakis et al. in [2] formulated the repair problem
and proposed the regenerating codes (RGC) with the aim
of efficient repair of the failed node. In the pioneer work
in [2], a data file with B symbols over the finite field F
2
w
is encoded into nα symbols and distributed to n nodes, with
each node storing α symbols such that the original data file
can be recovered from any k nodes. The requirement that any
k nodes are sufficient in decoding the original data file is called
the (n, k ) recovery property.
When a node fails, we replace it by a new node and
regenerate the data in the new node by downloading β symbols
from each of d surviving nodes. The storage nodes which
participate in the repair process are called the helpers. The
total number of symbols downloaded from the helpers is
coined the repair bandwidth in [2]. There are two main
versions of repair: exact repair and functional repair. In exact
repair, the symbols stored in the failed node are exactly
reproduced in the new node. In functional repair, the require-
ment is relaxed; the new node may contain different symbols
from that in the failed node as long as the repaired system
maintains the (n, k ) recovery property. We will refer to an
encoding scheme satisfying (i) the (n, k ) recovery property,
and (ii) any node failure can be repaired by contacting d nodes,
as an (n, k , d )-distributed storage system (DSS).
The minimization of repair bandwidth for functional repair
is closely related to the single-source multi-cast problem
in network coding theory. It is shown in [2] that for an
(n, k , d )-DSS, the file size B , the capacity of a storage node α,
and the repair bandwidth per helper node β , satisfy the
following inequality,
B ≤
k
i =1
min{(d - i + 1)β,α}. (1)
If we fix the file size B , we have a tradeoff between storage α
and repair bandwidth β . The two extreme points in this
tradeoff are termed the minimum storage regeneration (MSR)
and minimum bandwidth regeneration (MBR) points respec-
0018-9448 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.