A Novel Image Text Extraction Method Based on K-means Clustering
Yan Song
1,2
, Anan Liu
1
, Lin Pang
1,2
, Shouxun Lin
1
, Yongdong Zhang
1
, Sheng Tang
1
1
Key Laboratory of Intelligent Information Processing,
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, 100080
2
Graduate University of the Chinese Academy of Sciences, Beijing, China, 100080
{songyan,liuanan,panglin,sxlin,zhyd,ts}@ict.ac.cn
Abstract
Texts in web pages, images and videos contain
important clues for information indexing and retrieval.
Most existing text extraction methods depend on the
language type and text appearance. In this paper, a
novel and universal method of image text extraction is
proposed. A coarse-to-fine text location method is
implemented. Firstly, a multi-scale approach is
adopted to locate texts with different font sizes.
Secondly, projection profiles are used in location
refinement step. Color-based k-means clustering is
adopted in text segmentation. Compared to grayscale
image which is used in most existing methods, color
image is more suitable for segmentation based on
clustering. It treats corner-points, edge-points and
other points equally so that it solves the problem of
handling multilingual text. It is demonstrated in
experimental results that best performance is obtained
when k is 3. Comparative experimental results on a
large number of images show that our method is
accurate and robust in various conditions.
1. Introduction
Nowadays, we are deluged by information delivered
through all kinds of medium like Internet and
television. How to organize and manage these
multimedia data in order to make the indexing and
query convenient has become an urgent issue. In recent
years, many researches [1] have been done to solve the
problem. In multi-model information, text in images is
an important source because it contains tremendous
high-level semantic sense compared to visual and
audio information. For example, texts superimposed in
news videos usually generalize the content of the news
reports. Besides, optical character reader (OCR)
software is mature enough which is more robust than
automatic speech recognition (ASR) and visual
analysis techniques.
Image text recognition method generally comprises
following steps: text location, text segmentation and
text recognition. Text location methods can be
approximately divided into three kinds: connected
component based [2], texture based [3] and edge based
[4]. The first method locates text quickly but tends to
fail when the background is complex. The problem of
texture-based methods is large computational
complexity in the texture classification stage and it
may confuse when text-like regions appear. The third
one usually has a problem in handling large size texts.
In text segmentation, methods fall into two kinds. The
first one is based on color which separates text from
background by thresholding. The commonly adopted
thresholding methods include Otsu’s in [5], Niblack’s
in [6] and Bernsen’s in [7]. It is noticed that method of
thresholding is difficult to be adapted to all kinds of
situations. The other one is based on stroke which
employs some filters to pick the pixels on strokes [8].
But the pixels on the intersection of strokes are usually
ignored.
Considering the existing problems, a novel and
universal method of image text extraction is presented
which obtains satisfying results. A coarse-to-fine
process is used for text location. It consists of multi-
scale text location and text region refinement. Multi-
scale images are used to solve the problem of handling
texts with different font sizes. The refinement step
utilizes the horizontal and vertical projection profiles
to reject falsely located text regions and to contract text
regions. For text segmentation, grayscale image is
usually used in traditional method. But color
information is lost which is important to differentiating
text from background. K-means clustering is adopted
in our method and color image is more suitable for
segmentation based on clustering. Each pixel is
classified with color features without the constraint of
its location. Thereby, the method avoids the negative
influence of neighboring pixels. It is a universal
method for text segmentation which makes the process
Seventh IEEE/ACIS International Conference on Computer and Information Science
978-0-7695-3131-1/08 $25.00 © 2008 IEEE
DOI 10.1109/ICIS.2008.31
185
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 22, 2008 at 05:11 from IEEE Xplore. Restrictions apply.