Simultaneous detection of vertical and horizontal text lines based on perceptual organisation Claudie Faure a , Nicole Vincent b a CNRS-LTCI, TELECOM-ParisTech, 46 rue Barrault, 75634 Paris, Cedex 13, France b CRIP5 - Université Paris Descartes, 45, rue des Saints-Pères, 75270 Paris Cedex 06, France ABSTRACT A page of a document is a set of small components which are grouped by a human reader into higher level components, such as lines and text blocs. Document image analysis is aimed at detecting these components in document images. We propose the encoding of local information by considering the properties that determine perceptual grouping. Each connected component is labelled according to the location of its nearest neighbour connected component. These labelled components constitute the input of a rule-based incremental process. Vertical and horizontal text lines are detected without prior assumption on their direction. Touching characters belonging to different lines are detected early and discarded from the grouping process to avoid line merging. The tolerance for grouping components increases in the course of the process until the final decision. After each step of the grouping process, conflict resolution rules are activated. This work was motivated by the automatic detection of Figure&Caption pairs in the documents of the historical collection of the BIUM digital library (Bibliothèque InterUniversitaire Médicale). The images that were used in this study belong to this collection. Keywords: Historic documents, Figure-Caption pairs, Perceptual grouping, Text line detection 1. INTRODUCTION Typography and layout are defined to facilitate the perceptual organisation leading to the salience of the page components. The spatial organisation helps the reader to detect the textual components at several levels (alphabetical symbols, words, lines ...). It is also responsible of implicit links between components leading to group a figure and its caption or a text area with its title. Human perception is considered as a source of inspiration for several methods in document image analysis. Banks of Gabor filters are used to mimic the human filtering mechanisms to discriminate textures. This approach is mainly used to detect text areas [2, 10]. For text line detection, low resolution images are used in [6] to reduce the text lines to linear segments as they are perceived when we wink. The Gestalt laws of perceptual organisation have been taken as a model to detect automatically meaningful components and spatial relationships in a document or in drawings [5, 12, 13]. In [7], the alignments of connected components are detected with the Hough transform, they are sorted according to the rule of proximity that is responsible of the salience for some alignments. Besides the approaches centred on grouping, other methods are aimed at detecting separators between components. Projections profiles or more elaborated methods to detect white separators have been proposed [1, 3, 11]. Text lines are the typical components of the written language. They are between the word and the text bloc levels. When they are well segmented, columns and margins are also well segmented. The RLSA algorithm is certainly the most popular to extract text lines. Nevertheless, it cannot answer all the problems encountered and several methods were proposed to increase the performance of text line extraction, among them the recent methods proposed for printed text [4, 8]. The detection of text lines is a challenging problem for the simulation of the human perception of global organisation from local information. The proposed method starts from the local physical components of a page image and groups them according to the properties that enable a human reader to detect alignments of symbols. The main properties are the proximity, the similarity and the continuity of direction. The connected components (CCs) of the binarised image are early discriminated. A size criterion is used to interpret the greatest CCs as Graphics; they are labelled CCG and are discarded from the grouping process leading to text lines. The borders of the CCG bounding boxes are separators that text lines cannot straddle. The remaining CCs are labelled according to the position of their nearest neighbour in order to capture the proximity and alignment properties. They are the input of a rule-based incremental grouping process. This stepwise