ISSN 1054-6618, Pattern Recognition and Image Analysis, 2018, Vol. 28, No. 2, pp. 321–345. © Pleiades Publishing, Ltd., 2018. Recognition of Handwritten Arabic Characters using Histograms of Oriented Gradient (HOG) 1 Noor A. Jebril a, *, Hussein R. Al-Zoubi b , and Qasem Abu Al-Haija c, ** a Computer Sciences Department, King Faisal University, Hasa, 31982 Saudi Arabia b Computer Engineering Department, Yarmouk University, Irbid, 21163 Jordan c Electrical Engineering Department., King Faisal University, Hasa, 31982 Saudi Arabia *e-mail: njebril@kfu.edu.sa **e-mail: qalhaija@kfu.edu.sa Abstract—Optical Character Recognition (OCR) is the process of recognizing printed or handwritten text on paper documents. This paper proposes an OCR system for Arabic characters. In addition to the preprocessing phase, the proposed recognition system consists mainly of three phases. In the first phase, we employ word seg- mentation to extract characters. In the second phase, Histograms of Oriented Gradient (HOG) are used for fea- ture extraction. The final phase employs Support Vector Machine (SVM) for classifying characters. We have applied the proposed method for the recognition of Jordanian city, town, and village names as a case study, in addition to many other words that offers the characters shapes that are not covered with Jordan cites. The set has carefully been selected to include every Arabic character in its all four forms. To this end, we have built our own dataset consisting of more than 43.000 handwritten Arabic words (30000 used in the training stage and 13000 used in the testing stage). Experimental results showed a great success of our recognition method compared to the state of the art techniques, where we could achieve very high recognition rates exceeding 99%. Keywords: Arabic Handwritten Text Recognition, Optical Character Recognition (OCR), Gradient, Dominant Points (DPs), Feature Extraction, Histograms of Oriented Gradient (HOG), Support Vector Machine (SVM) DOI: 10.1134/S1054661818020141 1. INTRODUCTION Recently, the area of text recognition has gained substantial research interest as many approaches have been used in the literature to address the efficient solu- tion of Arabic Handwritten Recognition (AHR). Handwritten text recognition is an important tech- nique for many applications. The Arabic language has many distinguishing characteristics that make it differ- ent from the English language. Writing in Arabic, for example, should start from the right-hand side of the line and end at the left-hand side. Also, Arabic is a cursive language, meaning that words should be writ- ten using connected letters. Knowing that, each letter takes four different forms. The letter has a certain form if it comes in the beginning of a word, another form if it comes in the middle of a word, a third form if it comes in the end of a word, and a different form if it comes separated, as shown in Table 1 in the appendix of this paper. Consequently, the recognition of hand- written Arabic text is not a straightforward task to do. Therefore, not much success has been achieved so far for the recognition of Arabic text and more efforts are needed in this regard. Considering this, we propose a 1 The article is published in the original. new method for automatic recognition of handwritten Arabic characters. Roughly, the process begins by scanning these documents using a scanner, which pro- duces a colored image. Then, pre-processing of the scanned image is performed by using various image- processing techniques. After that, segmentation is applied to divide the digital image into multiple slices, by using Pixel by Pixel algorithm, proposed in our research, based on our collected data. The purpose of segmentation is to simplify the representation and analysis of an image. The most important stage though is features extraction, because every character has spe- cial features, which helps a lot in the recognition of the character. Finally, a classification algorithm is applied on the pre-processed segmented image to recognize the written characters [1]. There are many recognition algorithms in the liter- ature like recognition-based segmentation [2], raw pixel data [3], sparse auto encoder [3], and recently using histograms of oriented gradients (HOG) for English Character recognition [3, 4] which has been proven to produce promising results and achieve high recognition rates. Therefore, we propose to use HOG in our proposed approach, where to the best of our knowledge, has not been used before in Arabic charac- ter recognition. HOG is a feature descriptor used in computer vision and image processing for object detection. The technique counts occurrences of gradi- APPLIED PROBLEMS Received March 14, 2017