Volume 3, No. 1, January 2012
Journal of Global Research in Computer Science
RESEARCH PAPER
Available Online at www.jgrcs.info
© JGRCS 2010, All Rights Reserved 50
LINE SEGMENTATION USING CONTOUR TRACING
Ashu Kumar*
1
, Simpel Rani Jindal
2
, Galaxy Singla
3
1
Department of CSE, Yadwindra College of Engineering, Talwandi Sabo, Punjab, India
ashu.software.engineer@gmail.com
1
2
Department of CSE, Yadwindra College of Engineering, Talwandi Sabo, Punjab, India
simpel_jindal@rediffmail.com
2
3
Department of CSE, Bhai Maha Singh College of Engineering, Sri Muktsar Sahib, Punjab, India
galaxy_bansal@yahoo.co.in
3
Abstract: Text line segmentation is an important step because inaccurately segmented text lines will cause errors in the recognition stage. Text
line segmentation of the handwritten documents is still one of the most complicated problems in developing a reliable OCR. The nature of
handwriting makes the process of text line segmentation very challenging. Text characteristics can vary in font, size, orientation, alignment,
color, contrast, and background information. These variations turn the process of word detection complex and difficult. Since handwritten text
can vary greatly depending on the user skills, disposition and cultural background. The technique of Piece-wise projection alongwith contour
tracing to segment a handwritten document into distinct lines of text is presented. The proposed method is robust to handle line fluctuation.
Keywords: OCR, Line Segmentation, Histograms, chunks, Piece-wise separating lines, Potential PSLs.
INTRODUCTION
A lot of research work has been investigated for character
recognition of Gurmukhi script. For an optical character
recognition (OCR) system, segmentation phase is an
important phase and accuracy of any OCR heavily depends
upon segmentation phase. Incorrect segmentation leads to
incorrect recognition. Segmentation phase include line, word
and character segmentation. Before word and character
segmentation, line segmentation is performed to find the
number of lines and boundaries of each line in any input
document image. Incorrect line segmentation may result in
decrease in recognition accuracy.
For segmentation of lines from handwritten text, survey
papers are available [1,2]. Considerable amount of work has
been carried out to segment lines of handwritten Roman
script and there are varied and some well developed
techniques [3-7]. But very little work has been carried out
for Indic scripts like Devnagri, Bengali, Gurmukhi etc. Only
a few papers are available for segmentation of handwritten
Indic scripts [8-11].
The simplest and most widely used method to segment the
lines is to use the inter-line gap in horizontal projection as
line boundaries. This method does not work well on skewed,
fluctuating or proximate images. Here, we are modifying the
method to segment text lines based on histogram projection.
Figures 1,2, and 3 shows three kinds of sample documents
on which the line segmentation is performed. The rest of the
paper is organized as follows. Section 2 describes problems
associated with line segmentation. Section 3 describes the
method to be proposed. Experiments and results are
discussed in section 4 which is followed by conclusion in
section 5.
SEGMENTATION CHALLENGES
When dealing with handwritten text, line segmentation has
to solve some obstacles that are uncommon in modern
printed text. Among the most predominant are:
Skewed lines: lines of text in general are not straight.
Figure 1: Skewed lines
Fluctuating lines:
Figure 2: Fluctuating lines
Line proximity: Small gaps between neighbouring text lines
will cause touching and overlapping of components, usually
words or letters, between lines and irregularity in geometrical
properties of the line, such as line width, height, distance in
between words and lines, leftmost position etc.
Figure 3: Line proximity