Data categorization for a context return applied to logical document structure recognition Yves Rangoni Loria Research Center - Read Group Vandœuvre-l` es-Nancy, France rangoni@loria.fr Abdel Bela¨ ıd Loria Research Center - Read Group Vandœuvre-l` es-Nancy, France abelaid@loria.fr Abstract The purpose of this work is to develop a pattern recog- nition system simulating the human vision. A transparent neural network, with context returns is used. The context returns consist in using global vision to correct local vi- sion (i.e. input data are corrected according to neural net- work outputs). In order not to compute all the input features during these context returns, a ﬁlter-based method was de- signed to organize the features in clusters. This allows ﬁnd- ing a good subset of input features during each cycle, which reduce the computations. The method interest is shown in the case of logical document structure retrieval. 1. Introduction Computers carry out increasingly complex calculations but still cannot handle tasks that seem to be simple for a hu- man. This is particularly true in pattern recognition. Indeed, the human brain seems to be more skilful, for example, in processing a document (read and understand) because it uses its knowledge and its capacity to be adapted to new sit- uations. The use of context is very helpful for humans in solving ambiguous problems. The alliance of “primitives” that are immediately identiﬁed with the general shape of the pattern helps a lot in recognizing an object. This duality be- tween local and global vision will be explored in the forth- coming of this article and its use within the framework of the automatic document logical structure recognition. To carry out this task, we start from the physical layout (the way a document appears) to deduce the logical struc- ture (hierarchical way of content organization) from it. For example, a document has a title, is divided into several chap- ters themselves split up into sections, sub-sections, etc.) It would seem that there is a certain relation between these two structures, but it remains limited to an implica- tion that can be summarized as follows: “there are some el- ements of the logical structure (mainly at the macroscopic level) which result graphically in a particular form in the physical structure”. In the literature, we can see that it is a challenge to connect these two structures and just a few sys- tems achieved to extract the logical structure for a consistent set of heterogeneous documents. This task remains difﬁcult even for already numeri- cal documents like Pdf ﬁles (Portable Document Format). Some works as [1, 5, 8, 17] have been carried out by try- ing to ﬁnd a maximum of logical elements while working directly on the Pdf instruction stream (inner ﬁle representa- tion [10]). However if the results seem correct on some doc- uments, generally these techniques do not succeed in ﬁnd- ing even the raw text. This approach is as hard as beginning the process with digitalized image. We propose in this article a document analysis method modelling the human perception and simulating the duality between global and local vision. The article is organized as follows: ﬁrst, we present a brief description of a neural net- work technique and its interest compared to other methods with no training phase. Then, we explain an adaptation of an existing model used to carry out more ﬁnely the recognition task. Finally, we introduce a data categorization method, be- ing essential in our cognitive approach. 2. Cognitive approach and transparent neu- ral network Looking for some methods in the literature [13], we no- tice that the authors have tried to use expert systems [3, 9], rule based systems [6, 11] and other formalisms using gram- mars [4] to ﬁnd the logical structure. In these works, we can observe that they are based on the postulate that this struc- ture is built starting from rules and should be deduced eas- ily. However, in practice, it is really hard to extract, as there is a quantity of logical constructions and extremely diversi- ﬁed physical appearances. That is the reason why, most of the existing systems often need the DTD (Document Type Deﬁnition) of the document or are just limited to a few quite precise document classes in which the samples have physi-