Under Review The Utility of Feature Reuse: Transfer Learning in Data-Starved Regimes Edward Verenich 1,2 , Alvaro Velasquez 1 , M.G. Sarwar Murshed 2 , Faraz Hussain 2 1 Air Force Research Laboratory, 2 Clarkson University Abstract The use of transfer learning with deep neural networks has increasingly become widespread for deploying well-tested computer vision systems to newer domains, especially those with limited datasets. We describe a transfer learning use case for a domain with a data-starved regime, having fewer than 100 labeled target samples. We evaluate the effectiveness of convolutional feature extraction and ﬁne-tuning of overpa- rameterized models with respect to the size of target training data, as well as their generalization performance on data with covariate shift, or out-of-distribution (OOD) data. Our experi- ments show that both overparameterization and feature reuse contribute to successful application of transfer learning in training image classiﬁers in data-starved regimes. 1 Introduction Transfer learning (TL) has become an indispensable technique for deploying deep learning assisted computer vision systems to new domains. The basic approach is to use existing neural network architectures that were trained on large natural image datasets such as ImageNet [1] or CIFAR [2] and ﬁne-tune all or some of their weights towards some new applications [3– 7]. This approach has been successfully applied in medical applications such as radiology [8] and ophthalmology [9, 10]. A recent study by Raghu et al investigated the effects of TL for medical imaging and concluded that TL from natural image datasets to the medical domain offers limited perfor- mance gains with meaningful feature reuse concentrated at the lowest layers of the networks [11]. Related to our op- erational setting was the observation that the beneﬁt of TL from ImageNet based models to medical models in very small data regimes (which the authors consider those with datasets with 5000 datapoints or less), was largely due to architecture size. It follows that overparameterization 1 was the source of performance gain. 1 An overparameterization regime refers to a setting where the number of model parameters exceeds the number of training examples [12, 13]. In operational settings, we are often required to create im- age classiﬁers for novel classes of objects that are not repre- sented in natural image datasets, and because of the dynamic nature of our applications, our data-starved regime only al- lows for roughly 100 data points per class (i.e., an order of magnitude less than what Raghu et al consider very small data regimes [11]). In this paper, we evaluate the effectiveness of transfer learn- ing and overparameterization of ImageNet based models to- wards extremely small operational data. Our hypothesis is that transfer learning with overparameterized models enables building useful image classiﬁers in operational data starved environments, and that feature reuse is a signiﬁcant enabling mechanism of this transfer. This is in slight contrast to the analysis of Raghu et al, where they reported model overpa- rameterization as the main factor for transfer, while feature reuse was minimal. To examine the relationship of feature reuse and overparam- eterization, we trained several image classiﬁers by ﬁne-tuning a number of pretrained architectures to recognize a novel class of images. Our experiments show that overparameter- ization aids in useful transfer, but the beneﬁt of model size in terms of the number of trainable parameters to model per- formance levels off as the size of the model grows. We also show that feature reuse provides signiﬁcant beneﬁts to learner performance by demonstrating that random initialization of the same architectures in data-starved regimes results in poor generalization compared to models that leveraged the feature reuse of transfer learning for model ﬁne-tuning. 2 Application Domain and its Data Regime Our application domain pertains to the classiﬁcation of sensor images that may contain speciﬁc types of military hardware. For this analysis, we used a specialized hardware called trans- porter erector launcher (TEL) as a target class; our goal was to train classiﬁers to detect the presence of TELs in images. Our labeled data set contained 100 images of TELs, which we split into train/val/test sets using a 60/20/20 ratio. We also Page 1 arXiv:2003.04117v1 [cs.CV] 29 Feb 2020