A Fast Algorithm for Persian Handwritten Number Recognition with Computational Geometry Techniques Keivan Borna 1 , Vahid Haji Hashemi 2 1 Faculty of Mathematics and Computer Science, Kharazmi University, Tehran, Iran borna@khu.ac.ir 2 Faculty of Engineering, Kharazmi University, Tehran, Iran Hajihashemi.vahid@yahoo.com Abstract This paper aims to improve the feature extraction of Persian handwritten number recognition systems. In this paper, we introduced nine new features for detection and recognition of Persian handwritten digits using the technique of finding the smallest enclosing disc in computational geometry. All these features are based on the geometry form of numbers and are much better than the features in terms of accuracy such as gradient which are mentioned as the strongest feature in the literature. In fact our defined features are highly resistant to resize and rotation operations due to the circular form of Persian digits. In terms of feature extraction performance, our newly defined features are tested on Hoda database and the calculated results confirmed the efficiency of these features. Due to the improvement of recognition rate and acceptable speed, our defined features are better than other common features in digit recognition. Keywords: Persian handwritten numbers, OCR, feature extraction, computational geometry, smallest enclosing disk, KNN. 1. Introduction Recognition of handwritten documents including letters and numbers and converting them to typed text has always been one of the favorite topics in image processing. In the field of the English letters and contents, many different methods have been implemented and commercial versions of them can be seen in the form of OCR software which are usually provided with equipment’s such as scanners. Obviously, the accuracy of the numbers and letters recognition in English language is very high as many previous works admit this. These methods usually have very small errors. On the other hand many works on Persian and Arabic letters and numbers detection have been done [11]. Obviously work on Persian digits is very difficult due to the similarity of Persian handwritten digits. For example, the small difference between configurations of numbers 1, 2, 3 (in Persian) and also the variety of handwritings. This research aims to provide simple and highly accurate features in recognition of Persian digits which are explained explicitly after we give a literature review at the end of next section. The structure of this article is as follows: Section 2 provides a literature review on the background. In Section 3, our algorithm is presented and the extracted features are used to train a classifier (KNN). The implementation and test results on the Hoda binary image database are provided in Section 4. 2. Literature Review Several works have already been done in Recognition of handwritten documents. Basically they could be divided into three categories in general form: 2.1 Methods based on geometric features These methods use geometry of figures to extract features, the statistical invariants, histograms, directional properties and even fractals can be used to identify digits [2]. For example, in [3] the authors used correspondence between figures and digits. First, they extracted the features of a set of handwritten digits and then for a new digit, its rate compared with previous digits, was the recognition digit criteria. The disadvantage of this feature extraction method was its complexity. Selection a proper correspondence algorithm was also a challenge. In [4] gradient and histogram functions, that represent energy in pixels and different points of the image, are used. This idea is better than previous methods in terms of the feature extraction rate and the accuracy. Superiority of the gradient method is its ability in extension to Latin numbers [7]. ACSIJ Advances in Computer Science: an International Journal, Vol. 3, Issue 3, No.9 , May 2014 ISSN : 2322-5157 www.ACSIJ.org 93 Copyright (c) 2014 Advances in Computer Science: an International Journal. All Rights Reserved.