Car plate character recognition via
semi-supervised learning
João Pedro Kerr Catunda
Center of Technological Sciences
University of Santa Catarina State
Joinville, Brazil
jpkcbr@gmail.com
André Tavares da Silva
Center of Technological Sciences
University of Santa Catarina State
Joinville, Brazil
andre.silva@udesc.br
Lilian Berton
Institute of Science and Technology
Federal University of São Paulo
São José dos Campos, Brazil
lberton@unifesp.br
Abstract—Car plate character recognition can be difficult since
sample images might suffer perspective distortion, motion blur
and poor lighting. Moreover, there are many character pattern
variations such as size, font and color. Some related works are
based on supervised approach and take a lot of training and
labeled data to build an effective classification. Motivated by a
graph’s capacity to model the underlying manifold and in the
semi-supervised learning (SSL) ability to use few labeled data,
this paper employs a graph-based SSL approach for character
classification. We use four datasets: the first one was artificially
generated by symbol rotation, blurring and contrast shifting. The
second corresponds to Chars74K computer characters with font
variations. The third and fourth are digits obtained from real car
plates from the USA and Brazil. Classification experiments, based
on artificial and real data, show that SSL’s accuracy on a 10%
labeled dataset is statistically comparable to approaches such as
classical supervised algorithms k-NN, Naive Bayes, Decision Tree,
Neural Networks and SVM.
Index Terms—Image classification, car plate character recog-
nition, semi-supervised learning, graph-based methods
I. I NTRODUCTION
Automatic license plate recognition (ALPR) has been at-
tracting a lot of researchers attention. The increasing number
of vehicles makes it difficult to enforce the law and ensure
security without automated support. There are many related
applications like parking, tolling, traffic control, secured area
access control, stolen car spotting, velocity control, among
others. A typical ALPR system is divided into three steps:
i) plate detection - determines where in the image lies the
license plate; ii) character segmentation - locates each plate’s
alphanumeric character individually; iii) character recognition
- translates the alphanumeric image into text information.
Many ALPR systems have good results under controlled
situations or with expensive hardware. However, most open
environment cameras are not as sophisticated and cannot be
placed on privileged spots, generating images with motion
blur, perspective distortions, angular skew, poor lighting or
environmental interference. It is still a challenging task to
detect license plate and recognize its characters accurately in
an open environment.
This work focuses on the car plate character recognition
step (third). Most related works are supervised and use Neural
Network [1]–[6], Multi-Layer Perceptron (MLP) [7], [8] or
Support-Vector Machines (SVM) [9], [10], which demands a
lot of training and large training dataset with images labeled
by humans. Moreover, determining the appropriate number
of neurons and layers is not an intuitive task and can be
computationally complex. We aim to identify the characters
within the plate using a strategy that does not need a lot
of training or examples and has good accuracy in natural
scenes since obtaining large amounts of labeled samples can
be difficult, expensive or time-consuming. For those reasons,
we applied graph-based Semi-Supervised Learning (SSL) for
car plate character recognition. As far as we know, no work
has employed SSL in this task.
Car plate character recognition is an application that could
benefit from the SSL learning approach. Indeed, given the
universe of possible perspective, lighting, and car plate config-
urations (e.g. in the USA all states have different plates and in
many states, multiple/customized configurations are allowed),
the automation of this task can be quite difficult, requiring
large amounts of labeled samples that are difficult to obtain.
This is the most interesting feature of SSL when applied to
this problem since considering only 10% of training data is
much less expensive and makes the learning task more cost-
effective.
The supervised learning goal is to induce concepts from
previously labeled examples. The inductor manages to find a
function that maps the data to their labels, which is called the
hypothesis or model [11]. After the training process, given
a new example (not seen at training), the induced model (or
hypothesis) must be able to classify it. On the other hand,
SSL [12] seeks to learn from labeled and non-labeled data.
This approach attracted the scientific community’s attention
because in general there are only a few labeled samples
available. This is especially true for unlabeled massively stored
digital data.
There are many applications on graph-based SSL, such as
text classification [13], image annotation [14] and others. In
graph-based methods, each data instance is represented by a
vertex and some pairs of vertices are connected by a weighted
edge. In the majority of learning tasks, the data instances
are assumed independent and identically distributed (i.i.d.).
The usual approach, in this case, is to create a graph from
independent data instances at first and then apply a graph-
based learning algorithm on the constructed graph [13].
735
2019 8th Brazilian Conference on Intelligent Systems (BRACIS)
2643-6264/19/$31.00 ©2019 IEEE
DOI 10.1109/BRACIS.2019.00132