Research Article
IsolatedHandwrittenPashtoCharacterRecognitionUsinga K-NN
Classification Tool based on Zoning and HOG Feature
Extraction Techniques
Juanjuan Huang ,
1
Ihtisham Ul Haq,
2
Chaolan Dai,
1
Sulaiman Khan ,
2
Shah Nazir ,
2
and Muhammad Imtiaz
2
1
Hunan Police Academy, Changsha 410138, China
2
Department of Computer Science, University of Swabi, Khyber Pakhtunkhwa, Pakistan
CorrespondenceshouldbeaddressedtoJuanjuanHuang;huangjj@hnpa.edu.cnandSulaimanKhan;engr.sulaiman88@gmail.com
Received 3 February 2021; Revised 22 February 2021; Accepted 16 March 2021; Published 24 March 2021
Academic Editor: Dr Shahzad Sarfraz
Copyright © 2021 Juanjuan Huang et al. is is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
Handwritten text recognition is considered as the most challenging task for the research community due to slight change in
different characters’ shape in handwritten documents. e unavailability of a standard dataset makes it vaguer in nature for the
researchers to work on. To address these problems, this paper presents an optical character recognition system for the recognition
of offline Pashto characters. e problem of the unavailability of a standard handwritten Pashto characters database is addressed
by developing a medium-sized database of offline Pashto characters. is database consists of 11352 character images (258 samples
for each 44 characters in a Pashto script). Enriched feature extraction techniques of histogram of oriented gradients and zoning-
based density features are used for feature extraction of carved Pashto characters. K-nearest neighbors is considered as a
classification tool for the proposed algorithm based on the proposed feature sets. A resultant accuracy of 80.34% is calculated for
the histogram of oriented gradients, while for zoning-based density features, 76.42% is achieved using 10-fold cross validation.
1. Introduction
In this modern digital age of ever-growing computer
technology, the machine learning algorithms play a key role
in all fields of life, especially in the areas of text recognition
[1], network security [2, 3], privacy [4], traffic flow pre-
dictions [5], object detection [6], and may others. One of the
major applications of machine learning algorithm is Optical
Character Recognition (OCR) system development. e
OCR system reads the text from an image and converts it
into a computer-readable form. Several research works have
been addressed on the automatic recognition of multiple
languages such as Arabic, English, Persian, Chinese, and
Urdu [7, 8]. e main problems associated with these
languages are the cursive writing styles, writer’s handwriting
habits, and secondary components (diacritics). e Pashto
language has incorporated most of the Arabic, Urdu, and
Persian letters with some minor modifications. Due to this
reason of incorporation of letters, the Pashto language is
cursive in nature. e Pashto language consists of a large
character set (44 characters) greater than Urdu (38 char-
acters), Arabic (28 characters), and Persian (32 characters).
is large character script and minor change in character
shape make the recognition process more complex for
Pashto script.
Pashto is the maternal language of a large community of
residents in Northern areas of Pakistan and official language
of Afghanistan. Ahmad et al. [9] used k-nearest neighbors
(k-NNs) as a classification tool for printed Pashto character
recognition by using high-level feature extraction tech-
niques. Boulid et al. [7] suggested the use of a neural network
with spatial distribution of pixels (SDPs) and local binary
patterns (LBPs) for the recognition of handwritten Arabic
characters. Boufenar et al. [10, 11] presented an artificial
Hindawi
Complexity
Volume 2021, Article ID 5558373, 8 pages
https://doi.org/10.1155/2021/5558373