IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 8, AUGUST 2014 4637 Codes With Local Regeneration and Erasure Correction Govinda M. Kamath, N. Prakash, V. Lalitha, and P. Vijay Kumar, Fellow, IEEE Abstract— Regenerating codes and codes with locality are two coding schemes that have recently been proposed, which in addition to ensuring data collection and reliability, also enable efficient node repair. In a situation where one is attempting to repair a failed node, regenerating codes seek to minimize the amount of data downloaded for node repair, while codes with locality attempt to minimize the number of helper nodes accessed. This paper presents results in two directions. In one, this paper extends the notion of codes with locality so as to permit local recovery of an erased code symbol even in the presence of multiple erasures, by employing local codes having minimum distance >2. An upper bound on the minimum distance of such codes is presented and codes that are optimal with respect to this bound are constructed. The second direction seeks to build codes that combine the advantages of both codes with locality as well as regenerating codes. These codes, termed here as codes with local regeneration, are codes with locality over a vector alphabet, in which the local codes themselves are regenerating codes. We derive an upper bound on the minimum distance of vector- alphabet codes with locality for the case when their constituent local codes have a certain uniform rank accumulation property. This property is possessed by both minimum storage regeneration (MSR) and minimum bandwidth regeneration (MBR) codes. We provide several constructions of codes with local regeneration which achieve this bound, where the local codes are either MSR or MBR codes. Also included in this paper, is an upper bound on the minimum distance of a general vector code with locality as well as the performance comparison of various code constructions of fixed block length and minimum distance. Index Terms— Codes with local regeneration, codes with local- ity, concatenated codes, distributed storage, locally repairable codes, minimum distance bound, node repair, pyramid codes, regenerating codes, uniform rank accumulation, vector codes. I. I NTRODUCTION A PART from ensuring reliability of stored data, the prin- cipal goals in a distributed storage network relate to Manuscript received March 13, 2013; revised February 21, 2014; accepted May 18, 2014. Date of publication June 30, 2014, date of current version July 10, 2014. This work was supported in part by the National Science Foundation under Grant 0964507 and in part by the NetApp Faculty Fellow- ship Program. The work of V. Lalitha was supported by a TCS Research Scholarship. The material in this paper was presented in part at the 2012 Information Theory and Applications Workshop, in part at the 2012 IEEE International Symposium on Information Theory [1], in part at the 2012 NSF Workshop on Frontiers in Stochastic Systems, Networks and Control, Texas A & M University, in part at the 2012 Workshop on Trends in Coding Theory, in part at the 2013 Information Theory and Applications Workshop, and in part at the 2013 IEEE International Symposium on Information Theory [2]. G. M. Kamath is with the Department of Electrical Engineering, Stanford University, Stanford, CA 94305 USA (e-mail: gkamath@stanford.edu). N. Prakash, V. Lalitha, and P. V. Kumar are with the Department of Electrical Communication Engineering, Indian Institute of Science, Bangalore 560012, India (e-mail: prakashn@ece.iisc.ernet.in; lalitha@ece.iisc.ernet.in; vijay@ece.iisc.ernet.in). Communicated by M. Langberg, Associate Editor for Coding Theory. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIT.2014.2329872 data collection and node repair. Our interest is in coding schemes which store the data across n nodes in such a way that a data collector can recover the data by connecting to a small number k of nodes in the network. Node repair is to be accomplished by connecting to a subset of nodes and downloading a uniform amount of data from each node. The number of nodes contacted for repair is termed the repair degree while the total amount of data downloaded for repair is called the repair bandwidth. It is of interest to minimize both repair degree as well as repair bandwidth. It is also desirable to have multiple options for both data collection and node repair. Distributed storage systems found in practice, include Windows Azure Storage [3] and the Hadoop-based systems [4] used in Facebook and Yahoo. Maximum-distance separable (MDS) codes are commonly used in distributed storage systems, for example in HDFS RAID [5]. MDS coding schemes, while optimal in terms of storage overhead, are however inefficient in terms of node repair, as the repair degree as well as repair bandwidth are both large. Two alternative approaches to coding have recently been advocated to enable more efficient node repair, namely, regenerating codes [6] and codes with locality [7]. A. Regenerating Codes In the regenerating-code framework, there are n nodes in the network, with each node storing α code symbols drawn from a finite field F q . A data collector should be able to download the data by connecting to any k nodes (see Fig. 1). Node repair is required to be accomplished by connecting to any d nodes and downloading β α symbols from each node. Thus the repair bandwidth is given by dβ . A regenerating code may be regarded as a vector code, i.e., a code of block length n over the vector alphabet F α q . The parameter set of a regenerating code will be listed in one of two forms: ((n, k , d ), (α, β), B ) if the file size or number of message symbols B is known and relevant, or else as ((n, k , d ), (α, β)). Two notions of node repair exists - functional repair and exact repair. Under functional repair, the code symbols in the replacement node are such that data collection and node repair properties continue to hold. Under exact repair, the contents of the failed and replacement nodes are identical. The regenerating codes considered in this paper carry out exact repair. A cut-set bound based on network-coding concepts, tells us that under functional repair, given code parameters ((n, k , d ), (α, β), B ) the file size B is upper bounded [6] by B k-1 i =0 min{α, (d - i }. (1) 0018-9448 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.