IJSRD || National Conference on Advances in Computer Science Engineering & Technology || May 2017 || ISSN: 2321-0613
©IJSRD 2017 Published by IJSRD
180
A Hybrid approach for Optical Character
Verification
Hiral Modi
1
M. C. Parikh
2
1
P.G Scholar
2
Associate Professor
1,2
Department of Computer Science and Engineering
1,2
GTU, Ahmedabad, India
Abstract— At present scenario, there is growing demand for the software system to recognize characters in a computer system
when information is scanned through paper documents. This paper presents detailed review in the field of Optical Character
Recognition. Various techniques are determined that have been proposed to realize the center of character recognition in an
optical character recognition system. OCR (Optical Character Recognition) translates images of typewritten or handwritten
characters into the electronically editable format and it preserves font properties. Where OCV (Optical Character
Verification) is a hybrid approach of OCR and pattern matching. Different techniques for pre-processing and segmentation
have been surveyed and discussed. Proposed methodology and the dataset are presented here. Intermediate results have
shown in this paper.
Key words: Character, Pattern Matching, Character Recognition System, Image Segmentation, OCR, Preprocessing, Skew
correction.
I. INTRODUCTION
OCR (Optical Character Recognition) translates images of typewritten or handwritten characters into machine editable
format. OCR reads damaged or low-quality codes and returns the best guess at what the code is. It is widely used as a form of
information entry from printed paper data records, whether passport documents, invoices, bank statements, computerized
receipts, business cards, mail, printouts of static data, or any suitable documentation. OCR does not deal with quality and
sharpness of characters. To overcome the limitations of OCR a new approach comes into picture which is OCV.
Projection Profile-based methods used makes segmentation easy to separate the text in document image into lines,
words, and characters independent of the Language in the Text. Different methods are used at each intermediate stage of
OCR. Text Segmentation is done using Projection Profile method. They proposed an algorithm for correction of the skew
angle of the text document [1]. Blur is the important factor that damages OCR accuracy. In this paper prediction method
based on a local blur, estimation is proposed. The relation between blur effect and character size is investigated which is
useful for the classifier. Classifier separates the given document into three classes: readable, intermediate, non-readable
classes [2].
The grading system is used to evaluate the performance of printed text using various quality measures. The
recognition results showed high recognition rate as the system was able to perform a recognition rate of 98.69 % along with a
precision of 0.9857 and a sensitivity of 1 [3]. This paper presents complete OCR (Optical Character Recognition) system for
camera captured image/graphics embedded textual documents for handheld devices [4]. Paper [5] describes the skew
detection and correction of scanned document images written in Assamese language using the horizontal and vertical
projection profile analysis.
II. RELATED WORK
One of the most important steps of offline character recognition system is skew detection and correction which has to be used
in scanned documents as a pre-processing stage in almost all document analysis and recognition systems. This paper
describes the skew detection and correction of scanned document images written in Assamese language using the horizontal
and vertical projection profile analysis [5].
Documents with background images in OCR cause an error. A non-linear transformation is used to enhance the
contrast of each channel image. The method was tested using Tesseract (an open source OCR engine) and compared with two
commercial OCR software ABBYY Finereader and HANWANG (OCR software for Chinese characters). The experimental
results show that the recognition accuracies are improved significantly after removing background images [6]. For pre-
processing Fourier Transform is used which decomposes an image into sine and cosine components with increasing
frequencies. Fourier transform converts spatial domain onto frequency domain which is easily used for further processing [1].
Since past few years, research has been performed to develop machine printed Chinese/English characters. In this paper, they
described the search and fast match techniques. High-performance Chinese/English OCR engine is used to construct a large
vocabulary. They have collected 1862 text lines from varied sources such as newspapers, magazines, journals, books, etc [7].
H. Wang and J. Kangas [8] proposed a method of identifying character- like regions in order to extract and recognize
characters in natural color scene images automatically. Connected component extraction is used to check the block
candidates. Priority adaptive segmentation (PAS) is implemented to obtain accurate foreground pixels of the character in each
block. Paper [9] presented a system for text extraction based on the open-source OCR algorithm. The system is used for
functional verification of TV sets.