A. Petrosino (Ed.): ICIAP 2013, Part I, LNCS 8156, pp. 61–70, 2013. © Springer-Verlag Berlin Heidelberg 2013 Layout-Based Document-Retrieval System by Radon Transform Using Dynamic Time Warping Giuseppe Pirlo 1,* , Michela Chimienti 2 , Michele Dassisti 3 , Donato Impedovo 4 , and Angelo Galiano 4 1 Dipartimento di Informatica, Università degli Studi di Bari "A. Moro", via Orabona 4, 70125-Bari, Italy 2 Laboratorio Kad3, C.da Baione, 70043 Monopoli (BA), Italy 3 Dip. Meccanica, Management e Matematica, Politecnico di Bari, viale Japigia 182, 70126 - Bari, Italy 4 Dyrecta Lab, Via V. Simplicio 45, 70014 Conversano (BA), Italy giuseppe.pirlo@uniba.it Abstract. In the context of sustainability of document management technologies, this paper presents a new system for layout-based document retrieval specifically designed for commercial form retrieval. The system first uses a technique based on mathematical morphology to extract grid-based structural components from the document image. Successively, Radon Transform is used for document layout description. A document matching technique based on dynamic time warping is finally adopted. The experimental results carried out on real and simulated data set, demonstrate the effectiveness of the approach with respect to different classes of commercial forms. Keywords: Document management, Document Image Retrieval, Sustainability, Mathematic Morphology, Radon Transform, Dynamic Time Warping. 1 Introduction Information Retrieval (IR) is a critical task of document management systems as the number of documents available in databases and digital libraries exponentially grows. Quite often useless reprinting becomes a necessary activity in case of document loss or unavailability. This is also due to standard systems for document retrieval that use text data. They require a document to be present in text form and the querying method is based on a specific textual content in the document. Several advanced techniques have been proposed, based on set-theoretic, algebraic and probabilistic models [1, 2, 3]. Whatever the model used, one of the main drawback of text-based document retrieval systems is that they require a document in text form, since the search for similar documents is based on comparing the textual contents. As a consequence, a preliminary stage of image to text conversion by an Optical Character Recognizer (OCR) is required when a document is in image form. OCR is a time-consuming * Corresponding author.