Information Sciences 527 (2020) 108–127
Contents lists available at ScienceDirect
Information Sciences
journal homepage: www.elsevier.com/locate/ins
Privacy-Preserving distributed deep learning based on secret
sharing
Jia Duan
a
, Jiantao Zhou
a,∗
, Yuanman Li
b
a
State Key Laboratory of Internet of Things for Smart City, Department of Computer and Information Science, Faculty of Science and
Technology, University of Macau, China
b
College of Electronics and Information Engineering, Shenzhen University, China
a r t i c l e i n f o
Article history:
Received 20 September 2019
Revised 21 March 2020
Accepted 23 March 2020
Available online 26 March 2020
Keywords:
Deep neural network
Distributed deep learning
Secure multi-party computation
Privacy preserving
Secret sharing
a b s t r a c t
Distributed deep learning (DDL) naturally provides a privacy-preserving solution to enable
multiple parties to jointly learn a deep model without explicitly sharing the local datasets.
However, the existing privacy-preserving DDL schemes still suffer from severe information
leakage and/or lead to significant increase of the communication cost. In this work, we de-
sign a privacy-preserving DDL framework such that all the participants can keep their local
datasets private with low communication and computational cost, while still maintaining
the accuracy and efficiency of the learned model. By adopting an effective secret sharing
strategy, we allow each participant to split the intervening parameters in the training pro-
cess into shares and upload an aggregation result to the cloud server. We can theoretically
show that the local dataset of a particular participant can be well protected against the
honest-but-curious cloud server as well as the other participants, even under the chal-
lenging case that the cloud server colludes with some participants. Extensive experimental
results are provided to validate the superiority of the proposed secret sharing based dis-
tributed deep learning (SSDDL) framework.
© 2020 Elsevier Inc. All rights reserved.
1. Introduction
Recently, deep neural network (DNN) architectures have obtained impressive performance across a wide variety of fields,
such as face recognition [32,37], machine translation [8,11], object detection [26,36], and object classification [14,19]. As the
size of datasets increases, the computational intensity and memory demands of deep learning grow proportionally. Although
in recent years significant advances have been made in GPU hardware, network architectures and training methods, the
large-scale DNN training often takes an impractically long time on a single machine. Additionally, many accuracy improving
strategies in the deep learning, such as scaling up the model parameters [31], utilizing complex model [9], and training on
large-scale datasets [21], are also constrained by the computational power significantly.
Fortunately, distributed deep learning (DDL) framework provides a practicable and efficient solution to perform learning
over large-scale datasets, especially when some datasets belong to different owners (and hence cannot be shared directly).
To solve complex and time-consuming learning problems, DDL utilizes data parallelism and/or model parallelism [10]. In
∗
Corresponding author.
E-mail addresses: xuelandj@gmail.com (J. Duan), jtzhou@um.edu.mo (J. Zhou), yuanmanx.li@gmail.com (Y. Li).
https://doi.org/10.1016/j.ins.2020.03.074
0020-0255/© 2020 Elsevier Inc. All rights reserved.