Densely Connected Convolutional Networks Gao Huang * Cornell University gh349@cornell.edu Zhuang Liu * Tsinghua University liuzhuang13@mails.tsinghua.edu.cn Laurens van der Maaten Facebook AI Research lvdmaaten@fb.com Kilian Q. Weinberger Cornell University kqw4@cornell.edu Abstract Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convo- lutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections—one between each layer and its subsequent layer—our network has L(L+1) 2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several com- pelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage fea- ture reuse, and substantially reduce the number of parame- ters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain sig- nificant improvements over the state-of-the-art on most of them, whilst requiring less computation to achieve high per- formance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet. 1. Introduction Convolutional neural networks (CNNs) have become the dominant machine learning approach for visual object recognition. Although they were originally introduced over 20 years ago [18], improvements in computer hardware and network structure have enabled the training of truly deep CNNs only recently. The original LeNet5 [19] consisted of 5 layers, VGG featured 19 [29], and only last year Highway * Authors contributed equally x0 x1 H1 x2 H2 H3 H4 x3 x4 Figure 1: A 5-layer dense block with a growth rate of k =4. Each layer takes all preceding feature-maps as input. Networks [34] and Residual Networks (ResNets) [11] have surpassed the 100-layer barrier. As CNNs become increasingly deep, a new research problem emerges: as information about the input or gra- dient passes through many layers, it can vanish and “wash out” by the time it reaches the end (or beginning) of the network. Many recent publications address this or related problems. ResNets [11] and Highway Networks [34] by- pass signal from one layer to the next via identity connec- tions. Stochastic depth [13] shortens ResNets by randomly dropping layers during training to allow better information and gradient flow. FractalNets [17] repeatedly combine sev- eral parallel layer sequences with different number of con- volutional blocks to obtain a large nominal depth, while maintaining many short paths in the network. Although these different approaches vary in network topology and training procedure, they all share a key characteristic: they create short paths from early layers to later layers. 1 arXiv:1608.06993v5 [cs.CV] 28 Jan 2018