Neural Networks 118 (2019) 192–203 Contents lists available at ScienceDirect Neural Networks journal homepage: www.elsevier.com/locate/neunet Shunt connection: An intelligent skipping of contiguous blocks for optimizing MobileNet-V2 Brijraj Singh a,∗ , Durga Toshniwal a , Sharan Kumar Allur b a Department of CSE, Indian Institute of Technology Roorkee, India b Samsung Research Institute Bangalore, India article info Article history: Received 20 March 2019 Received in revised form 25 May 2019 Accepted 4 June 2019 Available online 27 June 2019 Keywords: MobileNet Compressed network Shunt connection Residual connections Encoder Model optimization abstract Enabling deep neural networks for tight resource constraint environments like mobile phones and cameras is the current need. The existing availability in the form of optimized architectures like Squeeze Net, MobileNet etc., are devised to serve the purpose by utilizing the parameter friendly operations and architectures, such as point-wise convolution, bottleneck layer etc. This work focuses on optimizing the number of floating point operations involved in inference through an already compressed deep learning architecture. The optimization is performed by utilizing the advantage of residual connections in a macroscopic way. This paper proposes novel connection on top of the deep learning architecture whose idea is to locate the blocks of a pretrained network which have relatively lesser knowledge quotient and then bypassing those blocks by an intelligent skip connection, named here as Shunt connection. The proposed method helps in replacing the high computational blocks by computation friendly shunt connection. In a given architecture, up to two vulnerable locations are selected where 6 contiguous blocks are selected and skipped at the first location and 2 contiguous blocks are selected and skipped at the second location, leveraging 2 shunt connections. The proposed connection is used over state-of-the-art MobileNet-V2 architecture and manifests two cases, which lead from 33.5% reduction in flops (one connection) up to 43.6% reduction in flops (two connections) with minimal impact on accuracy. © 2019 Elsevier Ltd. All rights reserved. 1. Introduction Increased hardware capacity in terms of computational speedup, compactness and portability has rejuvenated the ar- tificial neural networks in the recent past, which led them to set the new state-of-the-art in the field of artificial intelligence covering computer vision, natural language processing and other allied areas. Neural network injects learning capacity in the model by including more number of layers which enables it to learn complex and nonlinear patterns hidden in the data. However, neural networks, which are getting deeper in terms of layers’ stack are not found at its best for handling the applications where resources are restricted and inference time is paramount, such as mobile devices which need to serve the inevitable expecta- tions of fast response. The existing optimization ways suggest how can the aforementioned performance of the artificial neural network be extended for resource constraint environment (Dai, Tang, Xie, & Tang, 2018; Wu et al., 2018). A domain of the neural network works on compressing the architecture as a whole and so there are incremental improvements in successive ∗ Corresponding author. E-mail address: bsingh1@cs.iitr.ac.in (B. Singh). networks like SqueezeNet, ShuffelNet, MobileNet-V1, MobileNet- V2 etc. (Howard et al., 2017; Iandola et al., 2016; Sandler, Howard, Zhu, Zhmoginov and & Chen, 2018; Yosinski, Clune, Nguyen, Fuchs, & Lipson, 2015; Zhang, Zhou, Lin, & Sun, 2018). One subdomain of neural network took built-up steps and instead of modifying the architecture from scratch they picked up the already existing architecture and worked for the development of optimal model (Hubara, Courbariaux, Soudry, El-Yaniv & Bengio, 2016). Reducing the space taken by intermediate parameters by quantizing the number of bits required to store the value has been a promising idea to optimize the network size (Courbariaux, Bengio, & David, 2015; Han, Mao, & Dally, 2015; Polino, Pascanu, & Alistarh, 2018; Zhang, Li, Kara, Alistarh, Liu, & Zhang, 2017). Since convolution operation is the major source of compu- tations of any deep learning architecture. Therefore, in order to reduce the complexity of convolution operation, one study pro- posed the way of adding additional layers along with convolution layer for reducing the unnecessary computations (eliminating calculations with zero) (Dong, Huang, Yang, & Yan, 2017) and one another study proposed the use of point-wise convolution and developed Inception block (Szegedy et al., 2015). Model dis- tillation (Han et al., 2016; Hinton, Vinyals, & Dean, 2015; Polino et al., 2018) is one other way of compressing cumbersome model https://doi.org/10.1016/j.neunet.2019.06.006 0893-6080/© 2019 Elsevier Ltd. All rights reserved.