Abstract - Information representation as tables are compact and concise method that eases searching, indexing, and storage requirements. Extracting and cloning tables from parsable documents is easier and widely used, however industry still faces challenge in detecting and extracting tables from OCR documents or images. This paper proposes an algorithm that detects and extracts multiple tables from OCR document. The algorithm uses a combination of image processing techniques, text recognition and procedural coding to identify distinct tables in same image and map the text to appropriate corresponding cell in dataframe which can be stored as Comma-separated values, Database, Excel and multiple other usable formats. Keywords— Table Extraction, Optical Character Recognition, Image processing, Text Extraction, Morphological transformation. I. INTRODUCTION OCR documents or images of typed text and tables are a major challenge for parsing task to aid editing, searching, indexing and compact storage of documents. Table is a common and important form of representation and storage of information. Widely used as a form of documentation and information storage these promote interests of bank sectors, insurance domains, computerized receipts and various other domains. Extracting multiple tables from OCR documents and images is a widespread and challenging task because of its implementation and algorithmic complexity. Even simple tables are not indexed by systems easily. We have proposed a three stage algorithm to extract multiple tables from images and separate them using image processing, text recognition and procedural coding. This algorithm can efficiently extract all the tables from an image, digitize and separate to ease parsing. The remainder of this paper is organized as follows: Section 2 discusses related literature reviews and its shortcomings. Section 3 describes about Image processing method used in the algorithm presented in this paper. Section 4 describes the proposed algorithm, it’s flowchart and pseudocode of the various stages. Section 5 shows the output analysis and the algorithms efficiency. Finally, we conclude with summary and further research. II.RELATED LITERATURE REVIEW Various methods to detect tables and identify topological structures of images have been formulated in recent past. *Equal Contribution B. Freisleben et.al(2004)[1] proposed an efficient method to localize, binarize and segment text constituted in digital images. Gatos B. et.al(2005)[2] worked on table detection using horizontal and vertical lines. Several other table detection and segmentation methods in heterogeneous documents works are proposed[3-5]. Yefeng Zheng, Changsong Liu, Xiaoqing Ding and Shiyan Pan(2001)[6] proposed one of the most efficient form frame line detection algorithm which used single chain connected method. A Rehman, F Kurniawan & T Saba (2011)[7] removed the smashing of characters while line detection and removal from OCR images. Thotreingam Kasar, Philippine Barlas(2013)[8] detected table region by identifying column and row separators by applying a run-length approach to identify vertical and horizontal lines. Manolis Vasileiadis, Nikolaos Kaklanis, Konstantinos Votis, Dimitrios Tzovaras (2017)[19] proposed a method for automatic detection and extraction of Tabular data using page segmentation techniques to obtain text data and group them using bottom up technique. In this paper[9] author R.W Smith Detect table regions from heterogeneous Document images using Layout analysis module of Tesseract on document images using Tab Stop detection. The table detection algorithm is used in identifying the table partitions, Detecting Page Column split, Locating Table Columns, Marking the table regions and Removing the false alarms.The paper present method to detect table in varying layout documents but failed to identify and spot the graphical images containing texts. Author Cesarini et.al in paper[10] proposed a methodology to search for parallel lines of MXY tree in page. The detection is counterchecked by locating white spaces and perpendicular lines in the regions between detected parallel lines. On the basis of proximity and similarity criteria, located tables are merged. This will classify whether the images are tabular or not. This paper claims to provide table detection algorithm and index for table location evaluation, however fails to structurize the table contents In paper[8] author Thotreingam Kasar et.al proposed a method to detect horizontal and vertical lines by run-length approach. 26 low-level features are extracted from each group of horizontal and vertical lines. SVM classifier predicts whether the it belongs to the table or not.The author used Machine Learning methods to classify table regions in heterogeneous documents without resorting to heuristic rules. A Conglomerate of multiple OCR table detection and Smita Pallavi Raj Ratn Pranesh* Sumit Kumar* Birla Institute of Technology, Patna Birla Institute of Technology, Mesra Birla Institute of Technology, Mesra smita.pallavi@bitmesra.ac.in raj.ratn18@gmail.com sumit.atlancey@gmail.com A Conglomerate of Multiple OCR Table Detection and Extraction