ISSN 1054-6618, Pattern Recognition and Image Analysis, 2018, Vol. 28, No. 2, pp. 321–345. © Pleiades Publishing, Ltd., 2018.
Recognition of Handwritten Arabic Characters using Histograms
of Oriented Gradient (HOG)
1
Noor A. Jebril
a,
*, Hussein R. Al-Zoubi
b
, and Qasem Abu Al-Haija
c,
**
a
Computer Sciences Department, King Faisal University, Hasa, 31982 Saudi Arabia
b
Computer Engineering Department, Yarmouk University, Irbid, 21163 Jordan
c
Electrical Engineering Department., King Faisal University, Hasa, 31982 Saudi Arabia
*e-mail: njebril@kfu.edu.sa
**e-mail: qalhaija@kfu.edu.sa
Abstract—Optical Character Recognition (OCR) is the process of recognizing printed or handwritten text on
paper documents. This paper proposes an OCR system for Arabic characters. In addition to the preprocessing
phase, the proposed recognition system consists mainly of three phases. In the first phase, we employ word seg-
mentation to extract characters. In the second phase, Histograms of Oriented Gradient (HOG) are used for fea-
ture extraction. The final phase employs Support Vector Machine (SVM) for classifying characters. We have
applied the proposed method for the recognition of Jordanian city, town, and village names as a case study, in
addition to many other words that offers the characters shapes that are not covered with Jordan cites. The set has
carefully been selected to include every Arabic character in its all four forms. To this end, we have built our own
dataset consisting of more than 43.000 handwritten Arabic words (30000 used in the training stage and 13000
used in the testing stage). Experimental results showed a great success of our recognition method compared to
the state of the art techniques, where we could achieve very high recognition rates exceeding 99%.
Keywords: Arabic Handwritten Text Recognition, Optical Character Recognition (OCR), Gradient, Dominant
Points (DPs), Feature Extraction, Histograms of Oriented Gradient (HOG), Support Vector Machine (SVM)
DOI: 10.1134/S1054661818020141
1. INTRODUCTION
Recently, the area of text recognition has gained
substantial research interest as many approaches have
been used in the literature to address the efficient solu-
tion of Arabic Handwritten Recognition (AHR).
Handwritten text recognition is an important tech-
nique for many applications. The Arabic language has
many distinguishing characteristics that make it differ-
ent from the English language. Writing in Arabic, for
example, should start from the right-hand side of the
line and end at the left-hand side. Also, Arabic is a
cursive language, meaning that words should be writ-
ten using connected letters. Knowing that, each letter
takes four different forms. The letter has a certain form
if it comes in the beginning of a word, another form if
it comes in the middle of a word, a third form if it
comes in the end of a word, and a different form if it
comes separated, as shown in Table 1 in the appendix
of this paper. Consequently, the recognition of hand-
written Arabic text is not a straightforward task to do.
Therefore, not much success has been achieved so far
for the recognition of Arabic text and more efforts are
needed in this regard. Considering this, we propose a
1
The article is published in the original.
new method for automatic recognition of handwritten
Arabic characters. Roughly, the process begins by
scanning these documents using a scanner, which pro-
duces a colored image. Then, pre-processing of the
scanned image is performed by using various image-
processing techniques. After that, segmentation is
applied to divide the digital image into multiple slices,
by using Pixel by Pixel algorithm, proposed in our
research, based on our collected data. The purpose of
segmentation is to simplify the representation and
analysis of an image. The most important stage though
is features extraction, because every character has spe-
cial features, which helps a lot in the recognition of the
character. Finally, a classification algorithm is applied
on the pre-processed segmented image to recognize
the written characters [1].
There are many recognition algorithms in the liter-
ature like recognition-based segmentation [2], raw
pixel data [3], sparse auto encoder [3], and recently
using histograms of oriented gradients (HOG) for
English Character recognition [3, 4] which has been
proven to produce promising results and achieve high
recognition rates. Therefore, we propose to use HOG
in our proposed approach, where to the best of our
knowledge, has not been used before in Arabic charac-
ter recognition. HOG is a feature descriptor used in
computer vision and image processing for object
detection. The technique counts occurrences of gradi-
APPLIED
PROBLEMS
Received March 14, 2017