International Journal of Scientific & Engineering Research Volume 2, Issue 12, December-2011 1
ISSN 2229-5518
IJSER © 2011
http://www.ijser.org
Character Localization From Natural Images
Using Nearest Neighbours Approach
Shaila Chugh, Yogendra Kumar Jain
Abstract— Scene text contains significant and beneficial information. Extraction and localization of scene text is used in many
applications. In this paper, we propose a connected component based method to extract text from natural images. The proposed method
uses color space processing. Histogram analysis and geometrical properties are used for edge detection. Character recognition is done
through OCR which accepts the input in form of text boxes, which are generated through text detection and localization stages. Proposed
method is robust with respect to the font size, color, orientation, and style. Results of the proposed algorithm, by taking real scenes,
including indoor and outdoor images, show that this method efficiently extract and localize the scene text.
Index Terms— Character Localization, Scene Text, Nearest Neighbours, Edge Detection, OCR, Histogram, Filters.
—————————— ——————————
1 INTRODUCTION
EXT detection and localization from natural scene is an
active research area in computer vision field. Scene text
appear as a part of scene, such as text in vehicle number
plates, hoardings, books, CD covers, etc.
Various font sizes and styles, orientations, alignment, ef-
fects of uncontrolled illumination, reflections, shadows, the
distortion due to perspective projection as well as the com-
plexity of image backgrounds, makes automatic text localiza-
tion and extraction scene a challenging problem. Localization
of characters in images is used in many applications. Text de-
tection can be used in the applications of page segmentation,
document retrieving, address block location, etc. For extrac-
tion of text, different approaches have been suggested, based
on the text characteristics.
The method proposed by Xiaoqing Liu et al. [2] is based
on the fact that edges are a reliable feature of text, regardless
of color/intensity, layout, orientations, etc. Edge strength,
density and the orientation variance are three distinguishing
characteristics of text embedded in images, which can be used
as main features for detecting scene text. Their proposed me-
thod consists of three stages: target text area detection, text
area localization and character extraction.
Wang et al. [3] proposed a connected-component based
method which combines color clustering, a black adjacency
graph (BAG), an aligning-and-merging-analysis scheme and a
set of heuristic rules together to detect text in the application
of sign recognition such as street indicators and billboards.
Author has mentioned that uneven reflections have resulted in
incomplete character segmentation that increased the false
alarm rate. Kim et al. [4] implemented a hierarchical feature
combination method to implement text detection in real
scenes. However, authors admit that their proposed method
could not handle large text very well due to the use of local
features that represents only local variations of image blocks.
Gao et al. [5] developed a three layer hierarchical adaptive text
detection algorithm for natural scenes. It has been applied in
prototype Chinese sign translation system which mostly has a
horizontal and/or vertical alignment.
Cai et al. [7] have proposed a method that detects both low
and high contrast texts without being affected by language
and font-size. Their algorithm first converts the video image
into an edge map using color edge detector [9] and uses a low
global threshold to filter out definitely non-edge points. Then,
a selective local thresholding is performed to simplify the
complex background, then the edge-strength smoothing oper-
ator and an edge-clustering power operator highlights those
areas with high edge strength or edge density, i.e. text candi-
dates.
Garcia et al. [10] proposed a connected component based
method in which potential areas of text are detected by en-
hancement and clustering processes, considering most of the
constraints related to the texture of words. Then, classification
and binarization of potential text areas are achieved in a single
scheme performing color quantization and characters peri-
odicity analysis. Lienhart et al. [11] and Agnihotri et al. [8] are
also proposed connected component based approaches. Alain
Trémeau et al. [14] have proposed a method for detection and
segmentation of text layers in complex images, which uses a
geodesic transform based on a morphological reconstruction
technique to remove dark/light structures connected to the
borders of the image and to emphasize on objects in center of
the image and used a method based on difference of gamma
functions approximated by the Generalized Extreme Value
Distribution (GEVD) to find a correct threshold for binariza-
tion. Jianqiang Yan et al. [15] have used Gabor filters with
scale and direction varied to describe the strokes of Chinese
characters for target text area extraction and by establishing
four sub-neural networks to learn the texture of text area, the
learnt classifiers are used to detect target text areas.
Existing methods experience difficulties in handling texts
with various contrasts or inserted in a complex background. In
this paper, we propose a connected component based text
extraction algorithm, a general-purpose method, which can
quickly and effectively localize and extract text from both
document and indoor/ outdoor scene image.
2 PROPOSED METHOD
In our proposed method we consider that text present in im-
ages is in the horizontal direction with uniform spacing be-
T