Segmentation of Postal Envelopes for Address Block Location: an approach
based on feature selection in wavelet space
David Menoti, Díbio Leandro Borges, Jacques Facon, Alceu de Souza Britto Jr
Pontifical Catholic University of Parana (PUCPR)
Postgraduate Program in Applied Informatics (PPGIA)
Image Science Group
Rua Imaculada Conceição, 1155 – Prado Velho, Curitiba, PR, Brazil
E-mail: { menoti, dibio, facon, alceu } @ppgia.pucpr.br
Abstract
This paper presents a segmentation algorithm based
on feature selection in wavelet space. The aim is to
automatically separate in postal envelopes the regions
related to background, stamps, rubber stamps, and the
address blocks. First, a typical image of a postal
envelope is decomposed using Mallat algorithm and Haar
basis. High frequency channel outputs are analyzed to
locate salient points in order to separate the background.
A statistical hypothesis test is taken to decide upon more
consistent regions in order to clean out some noise left.
The selected points are projected back to the original
gray level image, where the evidence from the wavelet
space is used to start a growing process to include the
pixels more likely to belong to the regions of stamps,
rubber stamps, and written area. Experiments are run
using original postal envelopes from the Brazilian Post
Office Agency, and here we report results on 440 images
with many different layouts and backgrounds.
1. Introduction
Postal automation has been recently integrated into the
research agenda of the pattern recognition and computer
vision communities, since acquisition and storage of
images of envelopes and parcels has become easier and
cheaper than a decade ago. However, segmentation of a
typical image of a mail piece into background, stamps,
and the address blocks is still a challenging problem due
also to the large variety of stamps, backgrounds, written
text of the address (e.g. handwritten, printed, locations).
Other works in the literature have tackled different
aspects of that problem. A survey in document image
understanding up to 1994 can be seen in [3]. In [1] a
texture segmentation technique, which organizes the
wavelet coefficients of an image into a probabilistic graph
is presented. Fusion of that information by Hidden
Markov modeling is used to refine segmentation
hypotheses. A layout page segmentation is presented in
[2], and it is based on local feature extraction by wavelet
packets, followed by a soft integration process to vote for
layout borders detection. One of the few works we found
with results on envelopes is in [4], which presents a
method to identify regions in envelope images candidates
for being the destination address. The technique is a
texture segmentation based on Gabor filters. In [5] a
method to locate text areas against different backgrounds
is shown, which is based on a pseudo-motion technique to
identify oscillations on the wavelet coefficients. An
integrated system which performs fast identification of
zip codes given that the address block is provided is
reported in [7]. A parsing algorithm works intensively
separating words and symbols represented contours in the
form of chain codes. In [9] a fast segmentation approach
for address block location on oversized flat envelopes
was presented. The approach works based on measuring
homogeneities of gray level blocks, and adaptive
threshold values from the gradients of the blocks. An
split and merge like algorithm is then presented in [10]
which works based on special geometric layouts for the
address block. Corners and orientation of the lines are
extracted and that information is used for separating
background features from the address block. An
interesting work in text detection in document images
such as newspapers, photographs, and magazines is
shown in [11], where a texture segmentation module uses
gaussian derivative filters followed by a non-linear
transformation to produce the feature vectors. A method
to locate address blocks on images where an arbitrary
layout of printed text is known a priori is presented in
[12].
We present here a novel approach for segmentation of
an image of a postal envelope, it is a general and robust
segmentation method not restricted to a particular layout.
Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003)
0-7695-1960-1/03 $17.00 © 2003 IEEE