Combining Force Histogram and Discrete Lines to extract Dashed Lines
Isabelle Debled-Rennesson
1
and Laurent Wendling
2
1
LORIA Campus Scientifique, Universit´ e Henri Poincar´ e 54506 Vandœuvre-l` es-Nancy France
2
LIPADE, Universit´ e Paris Descartes, 45, rue des Saints-Pr´ es 75270 Paris Cedex 06
debled@loria.fr,wendling@parisdescartes.fr
Abstract
A new method to extract dashed lines in technical
documents is proposed in this paper by combining force
histogram and discrete lines. The aim is to study the
spatial location of couples of connected components
using force histogram and to refine the recognition by
considering surrounding discrete lines. This new model
is fast and it allows a good extraction of occulted pat-
terns in presence of noise. Efficient common methods
require several thresholds to process with technical doc-
uments. The proposed method requires only few thresh-
olds which can be automatically set from data.
1. Introduction
It is well-known that a dashed line brings precious
information for the understanding of the document (sep-
arating parts, associated text boxes, etc) [1, 2]. In many
recognition systems, it is important to have an accurate
and powerful operator related to the retrieval of such
typical lines. Most of the algorithms rely on some char-
acteristics common to all dashed lines. Three main hy-
potheses are generally taken into account: It exists a
minimum number of dashes having approximately the
same length, they are regularly spaced and they follow a
virtual line. Extraction can be carried out either directly
on the pixel image, using directional mathematical mor-
phology operators [3], or on the vectors set by the raster-
to-vector conversion [4]. A powerful approach has been
proposed by Dori et al. [5] in 1995. This method of-
fers a satisfying solution for this problem in most of
the cases. The underlying mechanism is a sequential
stepwise recovery of components that meet certain con-
tinuity conditions relating to common characteristics of
dashed lines. The method starts by extracting keys, that
is segments which are smaller than a given threshold
and which have at least one free extremum. The main
loop consists in choosing a key as the start of a new
dashed line hypothesis, and in trying to extend this hy-
pothesis in both directions, by adding other segments
belonging to the same virtual line. This search is done
in a search area whose width is the double of the current
key width, and whose length is the maximal distance
allowed between two segments belonging to a same
dashed line. Dosch et al. [2] have proposed some im-
provements to the basic method by studying connection
points and the merging dashed segments by propagat-
ing them following a distance threshold. Even if results
are satisfying in many cases, methods depend on well-
known raster-to-vector method drawbacks especially in
presence of noise. Furthermore distortions imply the
delicate location of patterns to be found. As a conse-
quence, numerous thresholds are generally manually set
while depending both on the scale of documents and on
the structure of patterns to be handled. Finally it is not
easy to assess the accuracy of extracted primitives from
data without human parameter setting. Here the prob-
lem is tackled by calculating force histogram between
pairs of labeled components to provide kernel pattern
to the calculation of discrete lines in order to propagate
the dashed lines to efficiently process occluded compo-
nents. This new promising generic method is fast and
relies on few thresholds which can be directly set from
document analysis.
2. Force Histogram
In this section, the computation of a histogram of
forces [6] is recalled. Let ϕ
r
be the map from R into
R
+
, null on R
-
, such that:
∀d ∈ R
∗
+
, ϕ
r
(d)=1/d
r
with r the kind of force. The method is further based on
the handling of segments to decrease the computation
time, that is the calculation of the attraction force f
r
of
a segment with regard to another. Let I and J be two
segments beared by the same line of angle θ, |I | and |J |
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.389
1578
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.389
1578
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.389
1574
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.389
1574
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.389
1574