remote sensing
Communication
Model Specialization for the Use of ESRGAN on Satellite and
Airborne Imagery
Étienne Clabaut
1,
*, Myriam Lemelin
1
, Mickaël Germain
1
, Yacine Bouroubi
1
and Tony St-Pierre
2
Citation: Clabaut, É.; Lemelin, M.;
Germain, M.; Bouroubi, Y.; St-Pierre,
T. Model Specialization for the Use of
ESRGAN on Satellite and Airborne
Imagery. Remote Sens. 2021, 13, 4044.
https://doi.org/10.3390/rs13204044
Academic Editor: Tania Stathaki
Received: 3 September 2021
Accepted: 5 October 2021
Published: 10 October 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1
Département de Géomatique Appliquée, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada;
myriam.lemelin@usherbrooke.ca (M.L.); mickael.germain@usherbrooke.ca (M.G.);
yacine.bouroubi@usherbrooke.ca (Y.B.)
2
XEOS Imaging Inc., Québec, QC G1P 4P5, Canada; tony.stpierre@xeosimaging.com
* Correspondence: etienne.clabaut@usherbrooke.ca
Abstract: Training a deep learning model requires highly variable data to permit reasonable gen-
eralization. If the variability in the data about to be processed is low, the interest in obtaining this
generalization seems limited. Yet, it could prove interesting to specialize the model with respect to a
particular theme. The use of enhanced super-resolution generative adversarial networks (ERSGAN),
a specific type of deep learning architecture, allows the spatial resolution of remote sensing images
to be increased by “hallucinating” non-existent details. In this study, we show that ESRGAN create
better quality images when trained on thematically classified images than when trained on a wide
variety of examples. All things being equal, we further show that the algorithm performs better on
some themes than it does on others. Texture analysis shows that these performances are correlated
with the inverse difference moment and entropy of the images.
Keywords: super-resolution; ESRGAN; generative adversarial networks; Haralick
1. Introduction
Images of high (HR, ~1–5 m per pixel) and very high (VHR, <1 m per pixel) spatial
resolution are of particular importance for several Earth observation (EO) applications,
such as for both visual and automatic information extraction [1–3]. However, currently,
most high-resolution and all very high-resolution images acquired by orbital sensors need
to be purchased at a high price. On the other hand, there is abundant medium-resolution
imagery currently available for free (e.g., the multispectral instrument onboard Sentinel-2
and the operational land imager onboard Landsat-8). Improving the spatial resolution of
medium-resolution imagery to the spatial resolution of high- and very high-resolution
imagery would thus be highly useful in a variety of applications.
Image resolution enhancement is called super-resolution (SR) and is currently a very
active research topic in EO image analysis [4–6] and computer vision in general, as shown
in [7]. However, SR is inherently an ill-posed problem [8]. Multi-frame super-resolution
(MFSR) uses multiple low-resolution (LR) images to constrain the reconstruction of a
high-resolution (HR) image. However, this approach cannot be used when a single image
is available. Single image super-resolution (SISR) is a particular type of SR that involves
increasing the resolution of a low-resolution (LR) image to create a high-resolution (HR)
image. SISR can be achieved by (1) the “external example-based” approach, where the
algorithm learns from dictionaries [9], or by using (2) convolutional neural networks
(CNNs), where the algorithm “learns” the relevant features of the image that would be
useful for improving its resolution [10,11]. SISR can also be achieved by using (3) generative
adversarial neural networks (GANs) [12]. GANs oppose two networks (a generator and
a discriminator), one against the other, in an adversarial way. The generator is trained to
produce new images to trick the discriminator into trying to distinguish whether it is a real
image or a fake image. In this type of network, the generator and the discriminator act
Remote Sens. 2021, 13, 4044. https://doi.org/10.3390/rs13204044 https://www.mdpi.com/journal/remotesensing