Neural Networks 118 (2019) 192–203
Contents lists available at ScienceDirect
Neural Networks
journal homepage: www.elsevier.com/locate/neunet
Shunt connection: An intelligent skipping of contiguous blocks for
optimizing MobileNet-V2
Brijraj Singh
a,∗
, Durga Toshniwal
a
, Sharan Kumar Allur
b
a
Department of CSE, Indian Institute of Technology Roorkee, India
b
Samsung Research Institute Bangalore, India
article info
Article history:
Received 20 March 2019
Received in revised form 25 May 2019
Accepted 4 June 2019
Available online 27 June 2019
Keywords:
MobileNet
Compressed network
Shunt connection
Residual connections
Encoder
Model optimization
abstract
Enabling deep neural networks for tight resource constraint environments like mobile phones and
cameras is the current need. The existing availability in the form of optimized architectures like
Squeeze Net, MobileNet etc., are devised to serve the purpose by utilizing the parameter friendly
operations and architectures, such as point-wise convolution, bottleneck layer etc. This work focuses
on optimizing the number of floating point operations involved in inference through an already
compressed deep learning architecture. The optimization is performed by utilizing the advantage of
residual connections in a macroscopic way. This paper proposes novel connection on top of the deep
learning architecture whose idea is to locate the blocks of a pretrained network which have relatively
lesser knowledge quotient and then bypassing those blocks by an intelligent skip connection, named
here as Shunt connection. The proposed method helps in replacing the high computational blocks by
computation friendly shunt connection. In a given architecture, up to two vulnerable locations are
selected where 6 contiguous blocks are selected and skipped at the first location and 2 contiguous
blocks are selected and skipped at the second location, leveraging 2 shunt connections. The proposed
connection is used over state-of-the-art MobileNet-V2 architecture and manifests two cases, which
lead from 33.5% reduction in flops (one connection) up to 43.6% reduction in flops (two connections)
with minimal impact on accuracy.
© 2019 Elsevier Ltd. All rights reserved.
1. Introduction
Increased hardware capacity in terms of computational
speedup, compactness and portability has rejuvenated the ar-
tificial neural networks in the recent past, which led them to
set the new state-of-the-art in the field of artificial intelligence
covering computer vision, natural language processing and other
allied areas. Neural network injects learning capacity in the model
by including more number of layers which enables it to learn
complex and nonlinear patterns hidden in the data. However,
neural networks, which are getting deeper in terms of layers’
stack are not found at its best for handling the applications where
resources are restricted and inference time is paramount, such
as mobile devices which need to serve the inevitable expecta-
tions of fast response. The existing optimization ways suggest
how can the aforementioned performance of the artificial neural
network be extended for resource constraint environment (Dai,
Tang, Xie, & Tang, 2018; Wu et al., 2018). A domain of the
neural network works on compressing the architecture as a
whole and so there are incremental improvements in successive
∗
Corresponding author.
E-mail address: bsingh1@cs.iitr.ac.in (B. Singh).
networks like SqueezeNet, ShuffelNet, MobileNet-V1, MobileNet-
V2 etc. (Howard et al., 2017; Iandola et al., 2016; Sandler, Howard,
Zhu, Zhmoginov and & Chen, 2018; Yosinski, Clune, Nguyen,
Fuchs, & Lipson, 2015; Zhang, Zhou, Lin, & Sun, 2018). One
subdomain of neural network took built-up steps and instead
of modifying the architecture from scratch they picked up the
already existing architecture and worked for the development of
optimal model (Hubara, Courbariaux, Soudry, El-Yaniv & Bengio,
2016). Reducing the space taken by intermediate parameters by
quantizing the number of bits required to store the value has
been a promising idea to optimize the network size (Courbariaux,
Bengio, & David, 2015; Han, Mao, & Dally, 2015; Polino, Pascanu,
& Alistarh, 2018; Zhang, Li, Kara, Alistarh, Liu, & Zhang, 2017).
Since convolution operation is the major source of compu-
tations of any deep learning architecture. Therefore, in order to
reduce the complexity of convolution operation, one study pro-
posed the way of adding additional layers along with convolution
layer for reducing the unnecessary computations (eliminating
calculations with zero) (Dong, Huang, Yang, & Yan, 2017) and
one another study proposed the use of point-wise convolution
and developed Inception block (Szegedy et al., 2015). Model dis-
tillation (Han et al., 2016; Hinton, Vinyals, & Dean, 2015; Polino
et al., 2018) is one other way of compressing cumbersome model
https://doi.org/10.1016/j.neunet.2019.06.006
0893-6080/© 2019 Elsevier Ltd. All rights reserved.