Combining Force Histogram and Discrete Lines to extract Dashed Lines Isabelle Debled-Rennesson 1 and Laurent Wendling 2 1 LORIA Campus Scientifique, Universit´ e Henri Poincar´ e 54506 Vandœuvre-l` es-Nancy France 2 LIPADE, Universit´ e Paris Descartes, 45, rue des Saints-Pr´ es 75270 Paris Cedex 06 debled@loria.fr,wendling@parisdescartes.fr Abstract A new method to extract dashed lines in technical documents is proposed in this paper by combining force histogram and discrete lines. The aim is to study the spatial location of couples of connected components using force histogram and to refine the recognition by considering surrounding discrete lines. This new model is fast and it allows a good extraction of occulted pat- terns in presence of noise. Efficient common methods require several thresholds to process with technical doc- uments. The proposed method requires only few thresh- olds which can be automatically set from data. 1. Introduction It is well-known that a dashed line brings precious information for the understanding of the document (sep- arating parts, associated text boxes, etc) [1, 2]. In many recognition systems, it is important to have an accurate and powerful operator related to the retrieval of such typical lines. Most of the algorithms rely on some char- acteristics common to all dashed lines. Three main hy- potheses are generally taken into account: It exists a minimum number of dashes having approximately the same length, they are regularly spaced and they follow a virtual line. Extraction can be carried out either directly on the pixel image, using directional mathematical mor- phology operators [3], or on the vectors set by the raster- to-vector conversion [4]. A powerful approach has been proposed by Dori et al. [5] in 1995. This method of- fers a satisfying solution for this problem in most of the cases. The underlying mechanism is a sequential stepwise recovery of components that meet certain con- tinuity conditions relating to common characteristics of dashed lines. The method starts by extracting keys, that is segments which are smaller than a given threshold and which have at least one free extremum. The main loop consists in choosing a key as the start of a new dashed line hypothesis, and in trying to extend this hy- pothesis in both directions, by adding other segments belonging to the same virtual line. This search is done in a search area whose width is the double of the current key width, and whose length is the maximal distance allowed between two segments belonging to a same dashed line. Dosch et al. [2] have proposed some im- provements to the basic method by studying connection points and the merging dashed segments by propagat- ing them following a distance threshold. Even if results are satisfying in many cases, methods depend on well- known raster-to-vector method drawbacks especially in presence of noise. Furthermore distortions imply the delicate location of patterns to be found. As a conse- quence, numerous thresholds are generally manually set while depending both on the scale of documents and on the structure of patterns to be handled. Finally it is not easy to assess the accuracy of extracted primitives from data without human parameter setting. Here the prob- lem is tackled by calculating force histogram between pairs of labeled components to provide kernel pattern to the calculation of discrete lines in order to propagate the dashed lines to efficiently process occluded compo- nents. This new promising generic method is fast and relies on few thresholds which can be directly set from document analysis. 2. Force Histogram In this section, the computation of a histogram of forces [6] is recalled. Let ϕ r be the map from R into R + , null on R - , such that: d R + , ϕ r (d)=1/d r with r the kind of force. The method is further based on the handling of segments to decrease the computation time, that is the calculation of the attraction force f r of a segment with regard to another. Let I and J be two segments beared by the same line of angle θ, |I | and |J | 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.389 1578 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.389 1578 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.389 1574 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.389 1574 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.389 1574