178
Chapter VII
Discovering Spatio-Textual
Association Rules in
Document Images
Donato Malerba
Università degli Studi di Bari
Margherita Berardi
Università degli Studi di Bari
Michelangelo Ceci
Università degli Studi di Bari
Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
AbstrAct
This chapter introduces a data mining method for the discovery of association rules from images of
scanned paper documents. It argues that a document image is a multi-modal unit of analysis whose
semantics is deduced from a combination of both the textual content and the layout structure and the
logical structure. Therefore, it proposes a method where both the spatial information derived from a
complex document image analysis process (layout analysis), and the information extracted from the
logical structure of the document (document image classifcation and understanding) and the textual
information extracted by means of an OCR, are simultaneously considered to generate interesting pat-
terns. The proposed method is based on an inductive logic programming approach, which is argued
to be the most appropriate to analyze data available in more than one modality. It contributes to show
a possible evolution of the unimodal knowledge discovery scheme, according to which different types
of data describing the units of analysis are dealt with through the application of some preprocessing
technique that transform them into a single double entry tabular data.