Color K-means, Gaussian Filter and Aperture Concept
for Text Localization in Images
K. J. Dayananda
1
and D. Puttegowda
2
1-2
Department of Computer Science and Engineering ATME College of Engineering, Mysuru, India
Email: dayananda.kem@gmail.com, puttegowda.77@gmail.com
Abstract— Text extraction in image is an essential role in machine learning and computer vision
field. The text localization process determines the presence and location of text in the given
inputs. The challenges, which are occurred during the text localization task is different
orientation, low-resolution, complex background and illumination with variation in font size
and color. In this research paper, the color k-means is applied to make separate group of colors
present in the text image. Histogram equalization process is applied to sharpen the text pixels
from the background pixels. The Gaussian filter is applied to combine all text pixels together.
Aperture concept has implemented to locate the actual text regions. The standard datasets like
hua’s and nus dataset were used to estimate the performance of the presented model. Precision,
recall and f-measure is used to estimate the performance of the proposed model.
Index Terms— Color K-means, Histogram Equalization, Gaussian filter, Aperture, Text
localization.
I. INTRODUCTION
A text information conveys a set of meanings to the person which can be read and understand easily. Text in
videos carries rich set of information. Text extraction from images are challenging research field that fascinates a
huge number of scholars. Although various optical character recognition is developed still the problem of text
localization is not thoroughly solved. Text localization determines the existence of texts in a particular place,
where text localization identify the region of text pixels in an image. The main challenges arrived in the text
localization are complex background, low resolution, alignments of characters, lighting and varied shapes, size
and color of fonts. The motivation of text localization is an attempt towards the development of arbitrary
oriented multilingual text localization. The previous models are able locate the text in clear background with low
resolution, but most of the models fails to locate the text present illumination situation. This research work is
able to locate the text in illumination situation with considering all kinds of challenges.
In this work, as we have observed the text region has distinctive features compared to non-text regions and
connectivity is different from its background. In the first step, we used a color k-means algorithm divides the
input image into a set of colors given by k as a parameter. Then histogram equalization and Gaussian filter is
used to perceive the regions of texture in an image with the help of the texture filter function. Gaussian filter
makes edges of the image perceptible. The aperture concept is employed on the filtered image to extract the true
text contents. Then false positives are eliminated by studying the bounding box region of text components and
non-text components. By using k-means, Gaussian filter, and aperture feature we presented a text identification
model for images and videos. Investigational results proved that the presented model efficiently locate the region
Grenze ID: 01.GIJET.9.1.614
© Grenze Scientific Society, 2023
Grenze International Journal of Engineering and Technology, Jan Issue