N. El Gayar et al. (Eds.): ANNPR 2014, LNAI 8774, pp. 228–239, 2014.
© Springer International Publishing Switzerland 2014
End-Shape Recognition for Arabic Handwritten Text
Segmentation
Amani T. Jamal, Nicola Nobile, and Ching Y. Suen
CENPARMI (Centre for Pattern Recognition and Machine Intelligence)
Computer Science and Software Engineering Department, Concordia University
Montreal, Quebec, Canada
{am_jamal,nicola,suen}@cenparmi.concordia.ca
Abstract. Text segmentation is an essential pre-processing stage for many sys-
tems such as text recognition and word spotting. However, few methods have
been published for Arabic text segmentation. In Arabic handwritten documents,
separating text into words is challenging due to the enormous different Arabic
handwriting styles. In this paper, we present a new segmentation methodology
of an Arabic handwritten text line into words. Our proposed approach of text
segmentation utilizes the knowledge of Arabic writing characteristics. This me-
thod shows promising results.
Keywords: component, Arabic Handwritten Documents, segmentation, End-
Shape recognition.
1 Introduction
Extracting all the word images from a handwritten document is an essential pre-
processing step for two reasons [1]. First, for text recognition methods, which can be
categorized into letter-based and word-based, there is a need to work on pre-extracted
word images. Secondly, for word-spotting or content-based image retrieval tech-
niques, all the word images in the documents are required to be pre-segmented prop-
erly. Most of the techniques in handwritten document retrieval and recognition fail if
the texts are wrongly segmented into words.
Few methods have been published for Arabic text segmentation. In Arabic
handwritten documents, separating text into words is challenging due to the enormous
different Arabic handwriting styles. In this paper, we present a new segmentation
methodology of an Arabic handwritten text line into words. Our proposed approach of
text segmentation utilizes the knowledge of Arabic writing characteristics.
In this Section, we provide some background of the Arabic characteristics and the
previous works of text line segmentation into words. In addition, the challenges of
Arabic handwritten text segmentation are given. Finally, the proposed approach is
summarized with the rational of applying it and our overall methodology is explained.
The secondary component removal technique is briefly explained in Section 2. The
used metric-based segmentation method is explained in Section 3. The contribution of
this paper is described in Section 4. The experiment is explained in Section 5. Finally
the conclusion is given in Section 6.