N. El Gayar et al. (Eds.): ANNPR 2014, LNAI 8774, pp. 228–239, 2014. © Springer International Publishing Switzerland 2014 End-Shape Recognition for Arabic Handwritten Text Segmentation Amani T. Jamal, Nicola Nobile, and Ching Y. Suen CENPARMI (Centre for Pattern Recognition and Machine Intelligence) Computer Science and Software Engineering Department, Concordia University Montreal, Quebec, Canada {am_jamal,nicola,suen}@cenparmi.concordia.ca Abstract. Text segmentation is an essential pre-processing stage for many sys- tems such as text recognition and word spotting. However, few methods have been published for Arabic text segmentation. In Arabic handwritten documents, separating text into words is challenging due to the enormous different Arabic handwriting styles. In this paper, we present a new segmentation methodology of an Arabic handwritten text line into words. Our proposed approach of text segmentation utilizes the knowledge of Arabic writing characteristics. This me- thod shows promising results. Keywords: component, Arabic Handwritten Documents, segmentation, End- Shape recognition. 1 Introduction Extracting all the word images from a handwritten document is an essential pre- processing step for two reasons [1]. First, for text recognition methods, which can be categorized into letter-based and word-based, there is a need to work on pre-extracted word images. Secondly, for word-spotting or content-based image retrieval tech- niques, all the word images in the documents are required to be pre-segmented prop- erly. Most of the techniques in handwritten document retrieval and recognition fail if the texts are wrongly segmented into words. Few methods have been published for Arabic text segmentation. In Arabic handwritten documents, separating text into words is challenging due to the enormous different Arabic handwriting styles. In this paper, we present a new segmentation methodology of an Arabic handwritten text line into words. Our proposed approach of text segmentation utilizes the knowledge of Arabic writing characteristics. In this Section, we provide some background of the Arabic characteristics and the previous works of text line segmentation into words. In addition, the challenges of Arabic handwritten text segmentation are given. Finally, the proposed approach is summarized with the rational of applying it and our overall methodology is explained. The secondary component removal technique is briefly explained in Section 2. The used metric-based segmentation method is explained in Section 3. The contribution of this paper is described in Section 4. The experiment is explained in Section 5. Finally the conclusion is given in Section 6.