Progressive Gradient Pruning for Classification, Detection and Domain Adaptation Le Thanh Nguyen-Meidine, Eric Granger, Marco Pedersoli, Madhu Kiran LIVIA, Dept. of Systems Engineering Ecole de Technologie Superieur le-thanh.nguyen-meidine.1@ens.etsmtl.ca Louis-Antoine Genetec Inc. Montreal, Canada lablaismorin@genetec.com February 26, 2020 Abstract Although deep neural networks (NNs) have achieved state-of-the-art accuracy in many visual recognition tasks, the growing computational complexity and energy con- sumption of networks remains an issue, especially for ap- plications on platforms with limited resources and requir- ing real-time processing. Filter pruning techniques have recently shown promising results for the compression and acceleration of convolutional NNs (CNNs). However, these techniques involve numerous steps and complex op- timisations because some only prune after training CNNs, while others prune from scratch during training by inte- grating sparsity constraints or modifying the loss func- tion. In this paper we propose a new Progressive Gradient Pruning (PGP) technique for iterative filter pruning dur- ing training. In contrast to previous progressive pruning techniques, it relies on a novel filter selection criterion that measures the change in filter weights, uses a new hard and soft pruning strategy and effectively adapts momentum tensors during the backward propagation pass. Experi- mental results obtained after training various CNNs on image data for classification, object detection and domain adaptation benchmarks indicate that the PGP technique can achieve a better trade-off between classification ac- curacy and network (time and memory) complexity than PSFP and other state-of-the-art filter pruning techniques. 1 Introduction Convolutional neural networks (CNNs) learn discrimi- nant feature representations from labeled training data, and have achieved state-of-the-art accuracy across a wide range of visual recognition tasks, e.g., image classifi- cation, object detection, and assisted medical diagnosis. Since the breakthrough results achieved with AlexNet for the 2012 ImageNet Challenge [20], CNN’s accuracy has been continually improved with architectures like VGG [39], ResNet [11] and DenseNet [17], at the expense of growing complexity (deeper and wider networks) that re- quire more training samples and computational resources [18]. In particular, the speed of the CNNs can signifi- cantly degrade with such increased complexity. In order to deploy these powerful CNN architectures on compact platforms with limited resources (e.g., em- bedded systems, mobile phones, portable devices) and for real-time processing (e.g., video surveillance and mon- itoring, virtual reality), the time and memory complex- ity and energy consumption of CNNs should be reduced. 1 arXiv:1906.08746v4 [cs.LG] 25 Feb 2020