IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 8, AUGUST 2014 4637
Codes With Local Regeneration
and Erasure Correction
Govinda M. Kamath, N. Prakash, V. Lalitha, and P. Vijay Kumar, Fellow, IEEE
Abstract— Regenerating codes and codes with locality are two
coding schemes that have recently been proposed, which in
addition to ensuring data collection and reliability, also enable
efficient node repair. In a situation where one is attempting
to repair a failed node, regenerating codes seek to minimize
the amount of data downloaded for node repair, while codes
with locality attempt to minimize the number of helper nodes
accessed. This paper presents results in two directions. In one,
this paper extends the notion of codes with locality so as to permit
local recovery of an erased code symbol even in the presence of
multiple erasures, by employing local codes having minimum
distance >2. An upper bound on the minimum distance of such
codes is presented and codes that are optimal with respect to this
bound are constructed. The second direction seeks to build codes
that combine the advantages of both codes with locality as well
as regenerating codes. These codes, termed here as codes with
local regeneration, are codes with locality over a vector alphabet,
in which the local codes themselves are regenerating codes.
We derive an upper bound on the minimum distance of vector-
alphabet codes with locality for the case when their constituent
local codes have a certain uniform rank accumulation property.
This property is possessed by both minimum storage regeneration
(MSR) and minimum bandwidth regeneration (MBR) codes.
We provide several constructions of codes with local regeneration
which achieve this bound, where the local codes are either MSR
or MBR codes. Also included in this paper, is an upper bound on
the minimum distance of a general vector code with locality as
well as the performance comparison of various code constructions
of fixed block length and minimum distance.
Index Terms— Codes with local regeneration, codes with local-
ity, concatenated codes, distributed storage, locally repairable
codes, minimum distance bound, node repair, pyramid codes,
regenerating codes, uniform rank accumulation, vector codes.
I. I NTRODUCTION
A
PART from ensuring reliability of stored data, the prin-
cipal goals in a distributed storage network relate to
Manuscript received March 13, 2013; revised February 21, 2014; accepted
May 18, 2014. Date of publication June 30, 2014, date of current version
July 10, 2014. This work was supported in part by the National Science
Foundation under Grant 0964507 and in part by the NetApp Faculty Fellow-
ship Program. The work of V. Lalitha was supported by a TCS Research
Scholarship. The material in this paper was presented in part at the 2012
Information Theory and Applications Workshop, in part at the 2012 IEEE
International Symposium on Information Theory [1], in part at the 2012 NSF
Workshop on Frontiers in Stochastic Systems, Networks and Control, Texas
A & M University, in part at the 2012 Workshop on Trends in Coding Theory,
in part at the 2013 Information Theory and Applications Workshop, and in
part at the 2013 IEEE International Symposium on Information Theory [2].
G. M. Kamath is with the Department of Electrical Engineering, Stanford
University, Stanford, CA 94305 USA (e-mail: gkamath@stanford.edu).
N. Prakash, V. Lalitha, and P. V. Kumar are with the Department of
Electrical Communication Engineering, Indian Institute of Science, Bangalore
560012, India (e-mail: prakashn@ece.iisc.ernet.in; lalitha@ece.iisc.ernet.in;
vijay@ece.iisc.ernet.in).
Communicated by M. Langberg, Associate Editor for Coding Theory.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIT.2014.2329872
data collection and node repair. Our interest is in coding
schemes which store the data across n nodes in such a way
that a data collector can recover the data by connecting to
a small number k of nodes in the network. Node repair is
to be accomplished by connecting to a subset of nodes and
downloading a uniform amount of data from each node. The
number of nodes contacted for repair is termed the repair
degree while the total amount of data downloaded for repair is
called the repair bandwidth. It is of interest to minimize both
repair degree as well as repair bandwidth. It is also desirable to
have multiple options for both data collection and node repair.
Distributed storage systems found in practice, include
Windows Azure Storage [3] and the Hadoop-based
systems [4] used in Facebook and Yahoo. Maximum-distance
separable (MDS) codes are commonly used in distributed
storage systems, for example in HDFS RAID [5]. MDS
coding schemes, while optimal in terms of storage overhead,
are however inefficient in terms of node repair, as the
repair degree as well as repair bandwidth are both large. Two
alternative approaches to coding have recently been advocated
to enable more efficient node repair, namely, regenerating
codes [6] and codes with locality [7].
A. Regenerating Codes
In the regenerating-code framework, there are n nodes in the
network, with each node storing α code symbols drawn from
a finite field F
q
. A data collector should be able to download
the data by connecting to any k nodes (see Fig. 1). Node
repair is required to be accomplished by connecting to any
d nodes and downloading β ≤ α symbols from each node.
Thus the repair bandwidth is given by dβ . A regenerating
code may be regarded as a vector code, i.e., a code of block
length n over the vector alphabet F
α
q
. The parameter set
of a regenerating code will be listed in one of two forms:
((n, k , d ), (α, β), B ) if the file size or number of message
symbols B is known and relevant, or else as ((n, k , d ), (α, β)).
Two notions of node repair exists - functional repair and
exact repair. Under functional repair, the code symbols in
the replacement node are such that data collection and node
repair properties continue to hold. Under exact repair, the
contents of the failed and replacement nodes are identical. The
regenerating codes considered in this paper carry out exact
repair.
A cut-set bound based on network-coding concepts, tells
us that under functional repair, given code parameters
((n, k , d ), (α, β), B ) the file size B is upper bounded [6] by
B ≤
k-1
i =0
min{α, (d - i )β }. (1)
0018-9448 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.