Volume 3, No. 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info © JGRCS 2010, All Rights Reserved 50 LINE SEGMENTATION USING CONTOUR TRACING Ashu Kumar* 1 , Simpel Rani Jindal 2 , Galaxy Singla 3 1 Department of CSE, Yadwindra College of Engineering, Talwandi Sabo, Punjab, India ashu.software.engineer@gmail.com 1 2 Department of CSE, Yadwindra College of Engineering, Talwandi Sabo, Punjab, India simpel_jindal@rediffmail.com 2 3 Department of CSE, Bhai Maha Singh College of Engineering, Sri Muktsar Sahib, Punjab, India galaxy_bansal@yahoo.co.in 3 Abstract: Text line segmentation is an important step because inaccurately segmented text lines will cause errors in the recognition stage. Text line segmentation of the handwritten documents is still one of the most complicated problems in developing a reliable OCR. The nature of handwriting makes the process of text line segmentation very challenging. Text characteristics can vary in font, size, orientation, alignment, color, contrast, and background information. These variations turn the process of word detection complex and difficult. Since handwritten text can vary greatly depending on the user skills, disposition and cultural background. The technique of Piece-wise projection alongwith contour tracing to segment a handwritten document into distinct lines of text is presented. The proposed method is robust to handle line fluctuation. Keywords: OCR, Line Segmentation, Histograms, chunks, Piece-wise separating lines, Potential PSLs. INTRODUCTION A lot of research work has been investigated for character recognition of Gurmukhi script. For an optical character recognition (OCR) system, segmentation phase is an important phase and accuracy of any OCR heavily depends upon segmentation phase. Incorrect segmentation leads to incorrect recognition. Segmentation phase include line, word and character segmentation. Before word and character segmentation, line segmentation is performed to find the number of lines and boundaries of each line in any input document image. Incorrect line segmentation may result in decrease in recognition accuracy. For segmentation of lines from handwritten text, survey papers are available [1,2]. Considerable amount of work has been carried out to segment lines of handwritten Roman script and there are varied and some well developed techniques [3-7]. But very little work has been carried out for Indic scripts like Devnagri, Bengali, Gurmukhi etc. Only a few papers are available for segmentation of handwritten Indic scripts [8-11]. The simplest and most widely used method to segment the lines is to use the inter-line gap in horizontal projection as line boundaries. This method does not work well on skewed, fluctuating or proximate images. Here, we are modifying the method to segment text lines based on histogram projection. Figures 1,2, and 3 shows three kinds of sample documents on which the line segmentation is performed. The rest of the paper is organized as follows. Section 2 describes problems associated with line segmentation. Section 3 describes the method to be proposed. Experiments and results are discussed in section 4 which is followed by conclusion in section 5. SEGMENTATION CHALLENGES When dealing with handwritten text, line segmentation has to solve some obstacles that are uncommon in modern printed text. Among the most predominant are: Skewed lines: lines of text in general are not straight. Figure 1: Skewed lines Fluctuating lines: Figure 2: Fluctuating lines Line proximity: Small gaps between neighbouring text lines will cause touching and overlapping of components, usually words or letters, between lines and irregularity in geometrical properties of the line, such as line width, height, distance in between words and lines, leftmost position etc. Figure 3: Line proximity