Overhead-free In-place Recovery Scheme for
XOR-based Storage Codes
Ximing Fu
∗
, Zhiqing Xiao
†
, and Shenghao Yang
‡
∗
Department of Computer Science and Technology, Tsinghua University, Beijing, China
†
Department of Electronic Engineering, Tsinghua University, Beijing, China
‡
Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
fxm13@mails.tsinghua.edu.cn, xzq.xiaozhiqing@gmail.com, shyang@tsinghua.edu.cn
Abstract—This paper proposes a novel recovery scheme for the
XOR-based storage codes with the increasing-difference property.
For a message of kL bits stored in n storage nodes, a data
collector connects any k out of the n storage nodes to recover
the message. In our scheme, the data collector acquires exactly
L bits for each node, so that no transmission overhead exists.
Furthermore, we propose an in-place decoding algorithm that
acquires less auxiliary space than the existing decoding algorithm,
and the decoding computational complexity of our decoding
algorithm is the same as the existing decoding algorithm.
Index Terms—Distributed storage system, maximum distance
separable code, recovery scheme, in-place decoding.
I. I NTRODUCTION
In a distributed storage system, a message is divided into k
blocks, each of which consists of L bits. These k blocks are
encoded into n packets using a storage code, each of which is
stored in a distinct node. When a data collector (called DC)
wants to recover the message, it requires data from a subset of
the n nodes. A storage codes is Maximum Distance Separable
(MDS) when two preconditions are satisfied, the first of which
is that each node in the n nodes stores L bits (perhaps with
some overheads), and the second is that DC can recover the
message from any k out of the n nodes.
Reed-Solomon code is the most celebrated MDS code, and
is widely used in storage systems [1]. The encoding and
decoding of Reed-Solomon codes require operations over large
finite fields, whose complexity is high. Therefore, storage
codes using bitwise exclusive-or’s (XOR) for encoding and
decoding are of interests due to the low computational cost.
Paper [2]–[8] proposed some MDS codes that can correct
one, two, or three node failures. Later, [9] proposed an MDS
code that can decode at O
(
k
2
L
3
)
times. In order to further
reduce the complexity of encoding and decoding, a new type
of storage codes were proposed in [10]. These codes use
bit shifting and fewer XOR operations in the encoding and
decoding process, and they are able to recover the message
from any k out of the n nodes. Specifically, if the generator
matrix of the storage code satisfies the increasing-difference
property, an associated decoding algorithm, called ZigZag
decoding, can correctly recover the message using O
(
k
2
L
)
XOR’s. However, since both the number of stored bits in each
node and the number of transmitting bits from each node are
larger than L, the codes proposed in [10] are not strictly MDS.
TABLE I
COMPARISON BETWEEN OUR RESULT AND THE PREVIOUS RESULT
Recovery Recovery Bandwidth Extra Decoding Decoding Time
Scheme for Node i Storage Complexity
ZigZag decoding [10] L + i (k - 1) O (kL) O
(
k
2
L
)
Our recovery scheme L O (k log L) O
(
k
2
L
)
Moreover, the ZigZag decoding requires considerable auxiliary
space at the same time.
This paper considers the XOR-based storage codes in [10],
where the encoding and decoding operations are limited to
bit-shifting and XOR. We propose a novel recovery scheme,
which is different from the ZigZag decoding, for the storage
codes in [10]. The characteristics of this recovery scheme
include:
1) No Transmission Overheads: In order to recover the
message of kL bits, only kL bits are needed to transmit from
the k nodes to DC. Therefore, our scheme is optimal in terms
of the transmission efficiency.
2) In-place Decodable: After the transmissions, the kL bits
of message are stored in k vectors of length L. Then an
in-place decoding algorithm is executed to transform the k
vectors into the recovered message. The algorithm overwrites
the data when it is being executed, and only O (k log L) extra
space is needed to store the auxiliary variables. After the
algorithm execution completes, the k vectors become exactly
the desired recovered message.
3) Exclusive-or Implementable: Exclusive-or of two bits is
one of the fastest operations to implement. In our recovery
algorithm, we use no operations but XOR. Additionally, the
number of XOR operations is O
(
k
2
L
)
when L is large, and
it is the lowest complexity as far as we know.
The comparisons between our scheme and the existing
recovery scheme using ZigZag decoding are summarized in
Table I.
The rest of the paper is organized as follows: The distributed
storage system is described in Section II. And Section III
presents our recovery scheme. Specifically, Section III-A and
Section III-B introduce the transmission procedure and the
decoding procedure in the recovery scheme, respectively. We
provide a theorem to show the correctness of the recovery
scheme, which is proved in Section V. Moreover, an example
2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications
978-1-4799-6513-7/14 $31.00 © 2014 IEEE
DOI 10.1109/TrustCom.2014.70
552