Multimedia Tools and Applications
https://doi.org/10.1007/s11042-020-08883-w
Data augmentation for handwritten digit recognition
using generative adversarial networks
Ganesh Jha
1
· Hubert Cecotti
1
Received: 6 May 2019 / Revised: 17 March 2020 / Accepted: 27 March 2020 /
© Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract
Supervised learning techniques require labeled examples that can be time consuming to
obtain. In particular, deep learning approaches, where all the feature extraction stages are
learned within the artificial neural network, require a large number of labeled examples to
train the model. Various data augmentation techniques can be performed to overcome this
issue by taking advantage of known variations that have no impact on the label of an exam-
ple. Typical solutions in computer vision and document analysis and recognition are based
on geometric transformations (e.g. shift and rotation) and random elastic deformations of
the original training examples. In this paper, we consider Generative Adversarial Networks
(GAN), a technique that does not require prior knowledge of the possible variabilities that
exist across examples to create novel artificial examples. In the case of a training dataset
with a low number of labeled examples, which are described in a high dimensional space,
the classifier may generalize poorly. Therefore, we aim at enriching databases of images or
signals for improving the classifier performance by designing a GAN for creating artificial
images. While adding more images through a GAN can help, the extent to which it will help
is unknown, and it may degrade the performance if too many artificial images are added.
The approach is tested on four datasets on handwritten digits (Latin, Bangla, Devanagri, and
Oriya). The accuracy for each dataset shows that the addition of GAN generated images in
the training dataset provides an improvement of the accuracy. However, the results suggest
that the addition of too many GAN generated images deteriorates the performance.
Keywords Machine learning · Neural networks · Classification ·
Generative adversarial networks
1 Introduction
Multimedia applications use various media sources such as text, graphics, and images.
These different sources can be combined and extracted from multiple sources: text can be
Hubert Cecotti
hcecotti@csufresno.edu
1
Department of Computer Science, College of Science and Mathematics, California State
University, Fresno (Fresno State), 2576 E. San Ramon MS ST 109,
Fresno, CA 93740-8039, USA