Pattern Recognition 42 (2009) 1419--1444 Contents lists available at ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/locate/pr A multi-plane approach for text segmentation of complex document images Yen-Lin Chen a , Bing-Fei Wu b, ∗ a Department of Computer Science and Information Engineering, Asia University, 500 Liufeng Road, Wufeng, Taichung 41354, Taiwan b Department of Electrical and Control Engineering, National Chiao Tung University, 1001 Ta-Hsueh Road, Hsinchu 30010, Taiwan ARTICLE INFO ABSTRACT Article history: Received 19 January 2008 Received in revised form 1 September 2008 Accepted 19 October 2008 Keywords: Document image processing Text extraction Image segmentation Multilevel thresholding Region segmentation Complex document images This study presents a new method, namely the multi-plane segmentation approach, for segmenting and extracting textual objects from various real-life complex document images. The proposed multi-plane segmentation approach first decomposes the document image into distinct object planes to extract and separate homogeneous objects including textual regions of interest, non-text objects such as graphics and pictures, and background textures. This process consists of two stages—localized histogram multilevel thresholding and multi-plane region matching and assembling. Then a text extraction procedure is applied on the resultant planes to detect and extract textual objects with different characteristics in the respective planes. The proposed approach processes document images regionally and adaptively according to their respective local features. Hence detailed characteristics of the extracted textual objects, particularly small characters with thin strokes, as well as gradational illuminations of characters, can be well-preserved. Moreover, this way also allows background objects with uneven, gradational, and sharp variations in con- trast, illumination, and texture to be handled easily and well. Experimental results on real-life complex document images demonstrate that the proposed approach is effective in extracting textual objects with various illuminations, sizes, and font styles from various types of complex document images. © 2008 Elsevier Ltd. All rights reserved. 1. Introduction Extraction of textual information from document images provides many useful applications in document analysis and understanding, such as optical character recognition, document retrieval, and com- pression [1,2]. To-date, many techniques were presented for extract- ing textual objects from monochromatic document images [3–6]. In recent years, advances in multimedia publishing and printing tech- nology have led to an increasing number of real-life documents in which stylistic character strings are printed with pictorial, tex- tured, and decorated objects and colorful, varied background com- ponents. However, most of current approaches cannot work well for extracting textual objects from real-life complex document images. Compared to monochromatic document images, text extraction in complex document images brings many difficulties associated with the complexity of background images, variety, and shading of charac- ter illuminations, the superimposing of characters with illustrations and pictures, as well as other decorated background components. As a result, there is an increasing demand for a system that is able to read and extract the textual information printed on pictorial and ∗ Corresponding author. Tel.: +886 3 5131538; fax: +886 3 5712385. E-mail addresses: ylchen@asia.edu.tw (Y.-L. Chen), bwu@cssp.cn.nctu.edu.tw (B.-F. Wu). 0031-3203/$ - see front matter © 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2008.10.032 textured regions in both colored images as well as monochromatic main text regions. Several newly developed global thresholding methods are useful in separating textual objects from non-uniform illuminated doc- ument images. Liu and Srihari [7] proposed a method based on texture features of character patterns, while Cheriet et al. [8] pre- sented a recursive thresholding algorithm extended from Otsu's optimal criterion [9]. These methods are performed by classify- ing pixels in the original image as foreground objects (particularly textual objects of interest) or as background ones according to their gray intensities in a global view, and are attractive because of computational simplicity. However, binary images obtained by global thresholding techniques are subject to noise and distortion, especially because of uneven illumination and the spreading ef- fect caused by the image scanner. To solve the above-mentioned issues, Solihin and Leedham's integral ratio approaches [10] pro- vided a new class of histogram-based thresholding techniques which classify pixels into three classes: foreground, background, and a fuzzy region between two basic classes. In Ref. [11], Parker proposed a local gray intensity gradient thresholding technique which is effective for extracting textual objects in badly illumi- nated document images. Because this method is based on the assumption of binary document images, its application is limited to extracting character objects from backgrounds no more complex than monotonically changing illuminations. A local and adaptive