Car plate character recognition via semi-supervised learning João Pedro Kerr Catunda Center of Technological Sciences University of Santa Catarina State Joinville, Brazil jpkcbr@gmail.com André Tavares da Silva Center of Technological Sciences University of Santa Catarina State Joinville, Brazil andre.silva@udesc.br Lilian Berton Institute of Science and Technology Federal University of São Paulo São José dos Campos, Brazil lberton@unifesp.br Abstract—Car plate character recognition can be difficult since sample images might suffer perspective distortion, motion blur and poor lighting. Moreover, there are many character pattern variations such as size, font and color. Some related works are based on supervised approach and take a lot of training and labeled data to build an effective classification. Motivated by a graph’s capacity to model the underlying manifold and in the semi-supervised learning (SSL) ability to use few labeled data, this paper employs a graph-based SSL approach for character classification. We use four datasets: the first one was artificially generated by symbol rotation, blurring and contrast shifting. The second corresponds to Chars74K computer characters with font variations. The third and fourth are digits obtained from real car plates from the USA and Brazil. Classification experiments, based on artificial and real data, show that SSL’s accuracy on a 10% labeled dataset is statistically comparable to approaches such as classical supervised algorithms k-NN, Naive Bayes, Decision Tree, Neural Networks and SVM. Index Terms—Image classification, car plate character recog- nition, semi-supervised learning, graph-based methods I. I NTRODUCTION Automatic license plate recognition (ALPR) has been at- tracting a lot of researchers attention. The increasing number of vehicles makes it difficult to enforce the law and ensure security without automated support. There are many related applications like parking, tolling, traffic control, secured area access control, stolen car spotting, velocity control, among others. A typical ALPR system is divided into three steps: i) plate detection - determines where in the image lies the license plate; ii) character segmentation - locates each plate’s alphanumeric character individually; iii) character recognition - translates the alphanumeric image into text information. Many ALPR systems have good results under controlled situations or with expensive hardware. However, most open environment cameras are not as sophisticated and cannot be placed on privileged spots, generating images with motion blur, perspective distortions, angular skew, poor lighting or environmental interference. It is still a challenging task to detect license plate and recognize its characters accurately in an open environment. This work focuses on the car plate character recognition step (third). Most related works are supervised and use Neural Network [1]–[6], Multi-Layer Perceptron (MLP) [7], [8] or Support-Vector Machines (SVM) [9], [10], which demands a lot of training and large training dataset with images labeled by humans. Moreover, determining the appropriate number of neurons and layers is not an intuitive task and can be computationally complex. We aim to identify the characters within the plate using a strategy that does not need a lot of training or examples and has good accuracy in natural scenes since obtaining large amounts of labeled samples can be difficult, expensive or time-consuming. For those reasons, we applied graph-based Semi-Supervised Learning (SSL) for car plate character recognition. As far as we know, no work has employed SSL in this task. Car plate character recognition is an application that could benefit from the SSL learning approach. Indeed, given the universe of possible perspective, lighting, and car plate config- urations (e.g. in the USA all states have different plates and in many states, multiple/customized configurations are allowed), the automation of this task can be quite difficult, requiring large amounts of labeled samples that are difficult to obtain. This is the most interesting feature of SSL when applied to this problem since considering only 10% of training data is much less expensive and makes the learning task more cost- effective. The supervised learning goal is to induce concepts from previously labeled examples. The inductor manages to find a function that maps the data to their labels, which is called the hypothesis or model [11]. After the training process, given a new example (not seen at training), the induced model (or hypothesis) must be able to classify it. On the other hand, SSL [12] seeks to learn from labeled and non-labeled data. This approach attracted the scientific community’s attention because in general there are only a few labeled samples available. This is especially true for unlabeled massively stored digital data. There are many applications on graph-based SSL, such as text classification [13], image annotation [14] and others. In graph-based methods, each data instance is represented by a vertex and some pairs of vertices are connected by a weighted edge. In the majority of learning tasks, the data instances are assumed independent and identically distributed (i.i.d.). The usual approach, in this case, is to create a graph from independent data instances at first and then apply a graph- based learning algorithm on the constructed graph [13]. 735 2019 8th Brazilian Conference on Intelligent Systems (BRACIS) 2643-6264/19/$31.00 ©2019 IEEE DOI 10.1109/BRACIS.2019.00132