International Journal of Computer Applications (0975 – 8887) Volume 90 – No 13, March 2014 1 Estimation of Tilt in Characters and Correction for better Readability by OCR Systems C. S. Vijayashree P.E.T Research Centre, PES College of Engg. Mandya, India-571401 Vishwanath C. Kagawade Basaveswara College of Engg. Bagalkot India T. Vasudev Maharaja Research Foundation MIT Campus, Belawadi, S R Patna, India-571438 ABSTRACT The existing Optical Character Readers (OCRs) are capable of reading linear form text and have limitations to read artistic and non-linear form text. The tilt in characters contributes a major share in affecting the efficiency of the recognition algorithms. This paper presents a technique to estimate and correct the vertical tilt in printed characters of English in order to make an OCR to read the text more efficiently. The input characters are assumed to be segmented from the document image and free from noise. Initially, the direction of tilt of the characters is detected using a heuristically constructed knowledgebase. Next, the inclination of the character to its base is estimated using line drawing algorithm. Finally, the estimated tilt is corrected through rotation in counter direction of the tilt. The method has been tested with sufficient samples and readability analysis is performed with an OCR. Experimental results show an average improvement in readability by OCR from 20% before tilt correction to 82% after the tilt correction. Keywords Linear text, Artistic text, Tilt in characters, Tilt correction, OCR. 1. INTRODUCTION A significant area in the field of Digital Image Processing is Document Image Analysis(DIA). DIA is very important in applications like document identification/recognition, language identification, automatic reading from document etc. Many researchers are working on different problems on document images starting from image acquisition to image understanding [1,2]. Processing activities in DIA can be divided into Pre-processing, Segmentation, Script Identification, Page Layout Analysis (PLA) and Classification, Character Recognition etc [3], and these have lead into many vibrant research problems [2]. The results of the research on the above problems are gradually converging towards generic solutions to major issues in DIA. In spite of considerable research work in the area of DIA, a major issue which is not sufficiently addressed is the detection and correction of skew or tilt in characters. Tilt is the angular slant to the baseline introduced in the character. Tilted characters are mainly noticed in many artistic texts. Fig. 1 shows few samples of text with tilted characters. The characters extracted from such artistic text exhibit inherent tilt considerably. Fig.2 show few examples of tilted characters segmented from artistic text. Such tilted characters hinder the investigation of generic methods of recognition and the efficiency of recognition drops relatively. Hence tilt in characters contributes a major share in affecting the efficiency of the recognition algorithms. Fig 1: Samples of Artistic Text with Tilt in Characters Fig 2: Samples of Tilted Characters Literature survey reveals that most of the character recognition algorithms assume that the input is tilt corrected. When tilted characters are subjected to recognition through any existing algorithms, the rate of recognition obviously becomes low. The rate of character recognition is inversely proportional to the degree of tilt in characters i.e. higher the degree of tilt, lower is the recognition rate [4-6]. Considerable amount of research is reported in literature on the skew detection [7-13] and correction of document images. The document skew detection and correction algorithms cannot be extended to detect and correct tilt in characters. The characteristics considered for detection of skew at document level is different from the characteristics considered for skew detection at character level. Generally, global characteristics of the document like finding line orientations using Hough transformations [9], slope between nearest-neighbor chain(NNC) obtained in the documents [13], horizontal