International Journal of Digital Library Systems, 2(2), 27-54, April-June 2011 27 Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. Keywords: Document Image Segmentation, Index Page Detection, Math-Zone Detection, Table Detection, Tabular Structures, TOC Detection INTRODUCTION Billions of pages are to be scanned and analyzed to create document image libraries targeted to real-world applications. The task is daunting; A Unifed Algorithm for Identifcation of Various Tabular Structures from Document Images Sekhar Mandal, Bengal Engineering and Science University, Shibpur, India Amit K. Das, Bengal Engineering and Science University, Shibpur, India Partha Bhowmick, Indian Institute of Technology Kharagpur, India Bhabatosh Chanda, Indian Statistical Institute, Kolkata, India ABSTRACT This paper presents a unifed algorithm for segmentation and identifcation of various tabular structures from document page images. Such tabular structures include conventional tables and displayed math-zones, as well as Table of Contents (TOC) and Index pages. After analyzing the page composition, the algorithm initially classifes the input set of document pages into tabular and non-tabular pages. A tabular page contains at least one of the tabular structures, whereas a non-tabular page does not contain any. The approach is unifed in the sense that it is able to identify all tabular structures from a tabular page, which leads to a considerable simplifcation of document image segmentation in a novel manner. Such unifcation also results in speed- ing up the segmentation process, because the existing methodologies produce time-consuming solutions for treating different tabular structures as separate physical entities. Distinguishing features of different kinds of tabular structures have been used in stages in order to ensure the simplicity and effciency of the algorithm and demonstrated by exhaustive experimental results. however, there is a pressing need for these libraries, as we witness a spurt of activities in recent times in industries as well as in academia. Creation of a document image library involves a chain of thorough and intense activities like scanning, per-processing, segmentation, layout analysis, storage and retrieval, etc. Hence, it is still constrained with the requirement DOI: 10.4018/jdls.2011040103