IEEE SIGNAL PROCESSING LETTERS, VOL. 25, NO. 1, JANUARY 2018 85
Training-Free, Single-Image Super-Resolution Using
a Dynamic Convolutional Network
Aritra Bhowmik, Suprosanna Shit, and Chandra Sekhar Seelamantula, Senior Member, IEEE
Abstract—The typical approach for solving the problem of
single-image super-resolution (SR) is to learn a nonlinear mapping
between the low-resolution (LR) and high-resolution (HR) repre-
sentations of images in a training set. Training-based approaches
can be tuned to give high accuracy on a given class of images, but
they call for retraining if the HR → LR generative model deviates
or if the test images belong to a different class, which limits their
applicability. On the other hand, we propose a solution that does
not require a training dataset. Our method relies on constructing
a dynamic convolutional network (DCN) to learn the relation be-
tween the consecutive scales of Gaussian and Laplacian pyramids.
The relation is in turn used to predict the detail at a finer scale,
thus leading to SR. Comparisons with state-of-the-art techniques
on standard datasets show that the proposed DCN approach re-
sults in about 0.8 and 0.3 dB gain in peak signal-to-noise ratio for
2× and 3× SR, respectively. The structural similarity index is on
par with the competing techniques.
Index Terms—Convolutional neural network (CNN), deep learn-
ing, dynamic convolutional network (DCN), Gaussian/Laplacian
pyramids, super-resolution (SR).
I. INTRODUCTION
S
INGLE-IMAGE super-resolution (SR) is an important tool
in applications such as biomedical imaging [1], face hal-
lucination [2], etc. In single-image SR [3], [4], one infers lo-
cal image properties from the low-resolution (LR) image or
learns them over a collection of given high-resolution (HR)–LR
pairs. The learning approaches employ dictionaries or neural
networks, which capture the LR–HR association. In this letter,
we develop a technique that infers HR image features starting
from the LR image without going through the standard process
of training, obviating the need for a training dataset. Before
proceeding further, we review recent techniques that specifi-
cally tackle the single-image SR problem, and highlight their
strengths and weaknesses. Some of these techniques will be
Manuscript received June 28, 2017; revised August 17, 2017; accepted
September 4, 2017. Date of publication September 15, 2017; date of current
version November 29, 2017. The associate editor coordinating the review of
this manuscript and approving it for publication was Dr. S. Channappayya.
(Aritra Bhowmik and Suprosanna Shit contributed equally to this work and
must be treated as joint first authors). (Corresponding author: Chandra Sekhar
Seelamantula).
The authors are with the Department of Electrical Engineering, Indian In-
stitute of Science, Bangalore 560012, India (e-mail: aritra0593@gmail.com;
suprosanna93@gmail.com; chandra.sekhar@ieee.org).
This letter has supplementary downloadable material available at http://
ieeexplore.ieee.org.
Color versions of one or more of the figures in this letter are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/LSP.2017.2752806
used for making performance comparisons. A thorough review
of single-image SR techniques is available in [5].
A. Related Literature
A landmark contribution was made recently by Yang et al.
[6], [7], who trained dictionaries for LR and HR image patches.
The key assumption is that the LR and HR patches have the same
sparse representation in respective dictionaries. The sparse rep-
resentation corresponding to an LR patch from an unseen image
is used to synthesize the corresponding HR patch, thus lead-
ing to SR. With this as the central idea, Yang et al. [8] also
developed a coupled LR/HR dictionary model optimized us-
ing a joint cost function. Kim and Kwon used kernel ridge
regression and incorporated image priors to suppress ringing
artifacts [9]. Timofte et al. proposed anchored neighborhood
regression [10] in which they solve a ridge regression prob-
lem with neighborhood constraints, leading to a closed-form
solution for the regression coefficients. This approach is fast
and leads to qualitatively the same results as the compet-
ing techniques. In [11], they developed an advanced version
of the algorithm, which learns from the patches in the local
neighborhood of the anchor patch from the training dataset
and not from the dictionary. These were by far the best per-
forming techniques before the advent of neural network SR
approaches.
Within the learning paradigm, the SR problem is essentially
posed as one of discovering the nonlinear association between
the LR and HR patches. Dong et al. used a convolutional neural
network (CNN) [12], [13] to learn the end-to-end mapping be-
tween LR and HR pairs [14], [15]. The training is data-intensive
and time-consuming, whereas the run-time complexity is low
leading to fast SR. Recently, Dong et al. proposed a threefold
acceleration strategy:
1) introducing a deconvolution layer at the end of the CNN;
2) reducing the dimensionality of the input feature; and
3) employing smaller filter sizes, all of which resulted in a
40× speed-up.
This method is referred to as fast SRCNN [16]. Shi et al.
developed an efficient subpixel CNN to perform SR [17]. The
early layers operate on the LR image, whereas the final layer
relies on subpixel convolution to upscale the image. Kim et al.
proposed a very deep CNN architecture that learns the residual
images instead of the HR image [18]. They showed that the
very deep SR technique overcomes the limitations of SRCNN.
They also proposed a method called deep recursive CNN, which
incorporates skip connections between each hidden layer and
1070-9908 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.