A Novel Image Text Extraction Method Based on K-means Clustering Yan Song 1,2 , Anan Liu 1 , Lin Pang 1,2 , Shouxun Lin 1 , Yongdong Zhang 1 , Sheng Tang 1 1 Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, 100080 2 Graduate University of the Chinese Academy of Sciences, Beijing, China, 100080 {songyan,liuanan,panglin,sxlin,zhyd,ts}@ict.ac.cn Abstract Texts in web pages, images and videos contain important clues for information indexing and retrieval. Most existing text extraction methods depend on the language type and text appearance. In this paper, a novel and universal method of image text extraction is proposed. A coarse-to-fine text location method is implemented. Firstly, a multi-scale approach is adopted to locate texts with different font sizes. Secondly, projection profiles are used in location refinement step. Color-based k-means clustering is adopted in text segmentation. Compared to grayscale image which is used in most existing methods, color image is more suitable for segmentation based on clustering. It treats corner-points, edge-points and other points equally so that it solves the problem of handling multilingual text. It is demonstrated in experimental results that best performance is obtained when k is 3. Comparative experimental results on a large number of images show that our method is accurate and robust in various conditions. 1. Introduction Nowadays, we are deluged by information delivered through all kinds of medium like Internet and television. How to organize and manage these multimedia data in order to make the indexing and query convenient has become an urgent issue. In recent years, many researches [1] have been done to solve the problem. In multi-model information, text in images is an important source because it contains tremendous high-level semantic sense compared to visual and audio information. For example, texts superimposed in news videos usually generalize the content of the news reports. Besides, optical character reader (OCR) software is mature enough which is more robust than automatic speech recognition (ASR) and visual analysis techniques. Image text recognition method generally comprises following steps: text location, text segmentation and text recognition. Text location methods can be approximately divided into three kinds: connected component based [2], texture based [3] and edge based [4]. The first method locates text quickly but tends to fail when the background is complex. The problem of texture-based methods is large computational complexity in the texture classification stage and it may confuse when text-like regions appear. The third one usually has a problem in handling large size texts. In text segmentation, methods fall into two kinds. The first one is based on color which separates text from background by thresholding. The commonly adopted thresholding methods include Otsu’s in [5], Niblack’s in [6] and Bernsen’s in [7]. It is noticed that method of thresholding is difficult to be adapted to all kinds of situations. The other one is based on stroke which employs some filters to pick the pixels on strokes [8]. But the pixels on the intersection of strokes are usually ignored. Considering the existing problems, a novel and universal method of image text extraction is presented which obtains satisfying results. A coarse-to-fine process is used for text location. It consists of multi- scale text location and text region refinement. Multi- scale images are used to solve the problem of handling texts with different font sizes. The refinement step utilizes the horizontal and vertical projection profiles to reject falsely located text regions and to contract text regions. For text segmentation, grayscale image is usually used in traditional method. But color information is lost which is important to differentiating text from background. K-means clustering is adopted in our method and color image is more suitable for segmentation based on clustering. Each pixel is classified with color features without the constraint of its location. Thereby, the method avoids the negative influence of neighboring pixels. It is a universal method for text segmentation which makes the process Seventh IEEE/ACIS International Conference on Computer and Information Science 978-0-7695-3131-1/08 $25.00 © 2008 IEEE DOI 10.1109/ICIS.2008.31 185 Authorized licensed use limited to: IEEE Xplore. Downloaded on October 22, 2008 at 05:11 from IEEE Xplore. Restrictions apply.