[Mistry* et al., 5(6): June, 2016] ISSN: 2277-9655
IC™ Value: 3.00 Impact Factor: 3.785
http: // www.ijesrt.com © International Journal of Engineering Sciences & Research Technology
[199]
IJESRT
INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH
TECHNOLOGY
A REVIEW ON SEGMENTATION TECHNIQUES OF LINES, WORDS AND
CHARACTERS ON GUJARATI HANDWRITTEN DOCUMENT USING OCR
Nilam Mistry*, Sameer Vashi, Vidhi Patel, Kunal Shah, Denish Rixawapla,
Foram Rakholiya, Rakesh Savant
*
Babu Madhav Institute of Information Technology, Uka Tarsadia University
Maliba Campus, Gopal Vidhyanagar, Bardoli, Gujarat, India
DOI: 10.5281/zenodo.54779
ABSTRACT
OCR is technique to convert the handwritten or printed document into the digital format by scanning it which can be
understandable by a computer. OCR is important and challenging task in many computer vision applications.
Segmentation is generally the first stage in any attempt to analyse or interpret an image automatically. Segmentation
is separate the document into lines, lines to words and words to characters which has been one of the major
laboriousness in handwritten text recognition. The role of segmentation is a crucial in most tasks requiring image
analysis. The success or failure of a task is often a direct consequence of the success or failure of segmentation.
Handwritten text documents contain text in free flow manner, also writing style of users may different even sometimes
same user’s handwriting are different in different time. That is why segmentation is difficult in case of handwritten
text document. As this paper focuses on Gujarati language, it contains more curves, overlapping character & slopes.
So, it is very difficult to do segmentation on it. In this paper we have applied some of the segmentation techniques to
segment the handwritten Guajarati documents & reached to some conclusion.
KEYWORDS: OCR, Connected Components, Gujarati Script, Segmentation.
INTRODUCTION
OCR stands for optical character recognition. It is the popular technique in digital image processing. In document
processing, Image processing and pattern recognition, OCR is the most challenging research field. In computerization
of any language, one of the vital tasks is to develop an efficient and effective OCR system for the respected language.
Figure 1. Block diagram of OCR