F.J. Ferri et al. (Eds.): SSPR&SPR 2000, LNCS 1876, pp. 757-766, 2000.
© Springer-Verlag Berlin Heidelberg 2000
Segmentation of Text and Graphics/Images Using the
Gray-Level Histogram Fourier Transform
M.A. Patricio
1
, D. Maravall
2
1
Centro de Cálculo
2
Departamento de Inteligencia Artificial
Universidad Politécnica de Madrid
dmaravall@fi.upm.es
Abstract. One crucial issue in automatic document analysis is the
discrimination between text and graphics/images. This paper presents a novel,
robust method for the segmentation of text and graphics/images in digitized
documents. This method is based on the representation of window-like portions
of a document by means of their gray level histograms. Through empirical
evidence it is shown that text and graphics/images regions have different gray
level histograms. Unlike the usual approach for the characterization of
histograms that is based on statistics parameters a novel approach is introduced.
This approach works with the histogram Fourier transform since it possesses all
the information contained in the histogram pattern. The next and logical step is
to automatically select the most discriminant spectral components as far as the
text and graphics/images segmentation goal is concerned. A fully automated
procedure for the optimal selection of the discriminant features is also
expounded. Finally, empirical results obtained for the text and graphics/images
segmentation using a simple three-layer perceptron-like neural network are also
discussed.
Keywords: Feature extraction and selection; Image analysis; Applications:
automatic document analysis.
1. Introduction – The Gray Level Histogram as a Discriminant
Tool for Text and Graphics/Images Segmentation
Document image analysis is an active research and development field [1] in which
pattern recognition techniques are of the greatest interest. One critical issue in the
automatic analysis of digitized documents is the separation of text and
graphics/images. The text regions of the document are usually analyzed by means of
well-known OCR techniques, whereas the graphics and images are just codified in
order to obtain optimal storage and retrieval of such information. This communication
describes a novel method for the segmentation of text and graphics/images. This
method exploits the empirical evidence that regions of text and regions of
graphics/images have very different gray level histograms. As an illustration, figure 1
shows two examples. Notice the application of a window on the original digitized
document in order to compute the brightness histogram in small portions of the whole
document. The practical issues concerning the window size and the scanning process