Pattern Recognition 42 (2009) 1419--1444
Contents lists available at ScienceDirect
Pattern Recognition
journal homepage: www.elsevier.com/locate/pr
A multi-plane approach for text segmentation of complex document images
Yen-Lin Chen
a
, Bing-Fei Wu
b, ∗
a
Department of Computer Science and Information Engineering, Asia University, 500 Liufeng Road, Wufeng, Taichung 41354, Taiwan
b
Department of Electrical and Control Engineering, National Chiao Tung University, 1001 Ta-Hsueh Road, Hsinchu 30010, Taiwan
ARTICLE INFO ABSTRACT
Article history:
Received 19 January 2008
Received in revised form 1 September 2008
Accepted 19 October 2008
Keywords:
Document image processing
Text extraction
Image segmentation
Multilevel thresholding
Region segmentation
Complex document images
This study presents a new method, namely the multi-plane segmentation approach, for segmenting and
extracting textual objects from various real-life complex document images. The proposed multi-plane
segmentation approach first decomposes the document image into distinct object planes to extract and
separate homogeneous objects including textual regions of interest, non-text objects such as graphics and
pictures, and background textures. This process consists of two stages—localized histogram multilevel
thresholding and multi-plane region matching and assembling. Then a text extraction procedure is applied
on the resultant planes to detect and extract textual objects with different characteristics in the respective
planes. The proposed approach processes document images regionally and adaptively according to their
respective local features. Hence detailed characteristics of the extracted textual objects, particularly small
characters with thin strokes, as well as gradational illuminations of characters, can be well-preserved.
Moreover, this way also allows background objects with uneven, gradational, and sharp variations in con-
trast, illumination, and texture to be handled easily and well. Experimental results on real-life complex
document images demonstrate that the proposed approach is effective in extracting textual objects with
various illuminations, sizes, and font styles from various types of complex document images.
© 2008 Elsevier Ltd. All rights reserved.
1. Introduction
Extraction of textual information from document images provides
many useful applications in document analysis and understanding,
such as optical character recognition, document retrieval, and com-
pression [1,2]. To-date, many techniques were presented for extract-
ing textual objects from monochromatic document images [3–6]. In
recent years, advances in multimedia publishing and printing tech-
nology have led to an increasing number of real-life documents
in which stylistic character strings are printed with pictorial, tex-
tured, and decorated objects and colorful, varied background com-
ponents. However, most of current approaches cannot work well for
extracting textual objects from real-life complex document images.
Compared to monochromatic document images, text extraction in
complex document images brings many difficulties associated with
the complexity of background images, variety, and shading of charac-
ter illuminations, the superimposing of characters with illustrations
and pictures, as well as other decorated background components.
As a result, there is an increasing demand for a system that is able
to read and extract the textual information printed on pictorial and
∗
Corresponding author. Tel.: +886 3 5131538; fax: +886 3 5712385.
E-mail addresses: ylchen@asia.edu.tw (Y.-L. Chen),
bwu@cssp.cn.nctu.edu.tw (B.-F. Wu).
0031-3203/$ - see front matter © 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.patcog.2008.10.032
textured regions in both colored images as well as monochromatic
main text regions.
Several newly developed global thresholding methods are useful
in separating textual objects from non-uniform illuminated doc-
ument images. Liu and Srihari [7] proposed a method based on
texture features of character patterns, while Cheriet et al. [8] pre-
sented a recursive thresholding algorithm extended from Otsu's
optimal criterion [9]. These methods are performed by classify-
ing pixels in the original image as foreground objects (particularly
textual objects of interest) or as background ones according to
their gray intensities in a global view, and are attractive because
of computational simplicity. However, binary images obtained by
global thresholding techniques are subject to noise and distortion,
especially because of uneven illumination and the spreading ef-
fect caused by the image scanner. To solve the above-mentioned
issues, Solihin and Leedham's integral ratio approaches [10] pro-
vided a new class of histogram-based thresholding techniques
which classify pixels into three classes: foreground, background,
and a fuzzy region between two basic classes. In Ref. [11], Parker
proposed a local gray intensity gradient thresholding technique
which is effective for extracting textual objects in badly illumi-
nated document images. Because this method is based on the
assumption of binary document images, its application is limited
to extracting character objects from backgrounds no more complex
than monotonically changing illuminations. A local and adaptive