IJSRD || National Conference on Advances in Computer Science Engineering & Technology || May 2017 || ISSN: 2321-0613 ©IJSRD 2017 Published by IJSRD 180 A Hybrid approach for Optical Character Verification Hiral Modi 1 M. C. Parikh 2 1 P.G Scholar 2 Associate Professor 1,2 Department of Computer Science and Engineering 1,2 GTU, Ahmedabad, India Abstract— At present scenario, there is growing demand for the software system to recognize characters in a computer system when information is scanned through paper documents. This paper presents detailed review in the field of Optical Character Recognition. Various techniques are determined that have been proposed to realize the center of character recognition in an optical character recognition system. OCR (Optical Character Recognition) translates images of typewritten or handwritten characters into the electronically editable format and it preserves font properties. Where OCV (Optical Character Verification) is a hybrid approach of OCR and pattern matching. Different techniques for pre-processing and segmentation have been surveyed and discussed. Proposed methodology and the dataset are presented here. Intermediate results have shown in this paper. Key words: Character, Pattern Matching, Character Recognition System, Image Segmentation, OCR, Preprocessing, Skew correction. I. INTRODUCTION OCR (Optical Character Recognition) translates images of typewritten or handwritten characters into machine editable format. OCR reads damaged or low-quality codes and returns the best guess at what the code is. It is widely used as a form of information entry from printed paper data records, whether passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static data, or any suitable documentation. OCR does not deal with quality and sharpness of characters. To overcome the limitations of OCR a new approach comes into picture which is OCV. Projection Profile-based methods used makes segmentation easy to separate the text in document image into lines, words, and characters independent of the Language in the Text. Different methods are used at each intermediate stage of OCR. Text Segmentation is done using Projection Profile method. They proposed an algorithm for correction of the skew angle of the text document [1]. Blur is the important factor that damages OCR accuracy. In this paper prediction method based on a local blur, estimation is proposed. The relation between blur effect and character size is investigated which is useful for the classifier. Classifier separates the given document into three classes: readable, intermediate, non-readable classes [2]. The grading system is used to evaluate the performance of printed text using various quality measures. The recognition results showed high recognition rate as the system was able to perform a recognition rate of 98.69 % along with a precision of 0.9857 and a sensitivity of 1 [3]. This paper presents complete OCR (Optical Character Recognition) system for camera captured image/graphics embedded textual documents for handheld devices [4]. Paper [5] describes the skew detection and correction of scanned document images written in Assamese language using the horizontal and vertical projection profile analysis. II. RELATED WORK One of the most important steps of offline character recognition system is skew detection and correction which has to be used in scanned documents as a pre-processing stage in almost all document analysis and recognition systems. This paper describes the skew detection and correction of scanned document images written in Assamese language using the horizontal and vertical projection profile analysis [5]. Documents with background images in OCR cause an error. A non-linear transformation is used to enhance the contrast of each channel image. The method was tested using Tesseract (an open source OCR engine) and compared with two commercial OCR software ABBYY Finereader and HANWANG (OCR software for Chinese characters). The experimental results show that the recognition accuracies are improved significantly after removing background images [6]. For pre- processing Fourier Transform is used which decomposes an image into sine and cosine components with increasing frequencies. Fourier transform converts spatial domain onto frequency domain which is easily used for further processing [1]. Since past few years, research has been performed to develop machine printed Chinese/English characters. In this paper, they described the search and fast match techniques. High-performance Chinese/English OCR engine is used to construct a large vocabulary. They have collected 1862 text lines from varied sources such as newspapers, magazines, journals, books, etc [7]. H. Wang and J. Kangas [8] proposed a method of identifying character- like regions in order to extract and recognize characters in natural color scene images automatically. Connected component extraction is used to check the block candidates. Priority adaptive segmentation (PAS) is implemented to obtain accurate foreground pixels of the character in each block. Paper [9] presented a system for text extraction based on the open-source OCR algorithm. The system is used for functional verification of TV sets.