International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 05 | May-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 3968 Optical Character Recognition for Hindi Prasanta Pratim Bairagi Assistant Professor, Department of CSE, Assam down town University, Assam, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract -Optical Character Recognition is a system which can perform the translation of images from handwritten or printed form to machine-editable form. Devanagari script is used in many Indian languages like Hindi, Nepali, Marathi, Sindhi etc. This script forms the foundation of the language like Hindi which is the national and most widely spoken language in India. In current scenario, there is a huge demand in “storing the information in digital format available in paper documents and then later reusing this information by searching process”. In this paper we propose a new method for recognition of printed Hindi characters in Devanagari script. In this project different pre-processing operations like features extraction, segmentations and classification have been studied and implemented in order to design a sophisticated OCR system for Hindi based on Devanagari script. During this research, different related research papers on existing OCR systems have been studied. In this project the main emphasis is given towards the recognitions of the individual consonants and vowels which can be later extended to recognize complex derived letters & words. Key Words: Optical Character Recognition, Feature Extraction, Segmentation, Hindi Character, Devanagari Script 1. INTRODUCTION The introduction part is divided into two individual parts. The first part defines about OCR, its types and its uses and the second part defines about Devanagari script, the foundation of Hindi language. 1.1 About OCR Optical Character Recognition has emerged as a major research area since 1950. Optical Character Recognition is the mechanical or electronic translation of images of handwritten or printed text into machine-editable text [1]. The images are usually captured by a scanner. However, throughout the text, we would be referring to printed text by OCR. Data Entry through OCR is relatively fast, more accuracy, and generally more efficiency than usual keyboard entry. An OCR system enables us to store a book or a magazine article directly into digital form and also make it editable. Development of OCR for Indian script is an active area of research and it also gives great challenges to design an OCR due to the large number of letters in the alphabet, the sophisticated ways in which they combine, and the complicated graphemes they result in. Usually in Devanagari script, there is no separation between the characters written in a text. In this research work different pre-processing operations like conversion of gray scale images to binary images, image rectification and segmentation are considered in order to design this system. 1.2 Types of OCR Basically, there are three types of OCR. They are briefly discussed below: Offline Handwritten Text The text produced by a person by writing with a pen/ pencil on a paper and then scanned the document to digitalized them is called Offline Handwritten Text. Online Handwritten Text Online handwritten text is the one written directly on a digital platform using different digital device. The output is a sequence of x-y coordinates that express pen position as well as other information such as pressure and speed of writing. Machine Printed Text Machine printed texts are commonly found in printed documents and it is produced by offset processes. 1.3 Uses of OCR Optical Character Recognition is used to scan different types of documents such as PDF files or images and convert them into editable file. The OCR system is used for the following purposes: Processing Bank cheese Documenting library materials into digital format. Storing documents in digital form, searching text and extracting data. 1.4 About Devanagari Script Devanagari script is the foundation of many Indian languages like Hindi, Nepali, Marathi, Sindhi etc and used by more than 300 million people around the world. So Devanagari script plays a very major role in the development of literature and manuscripts. There is so much of literature from the old age manuscripts, Vedas and scriptures and since these are so old so these are not easily accessible to everyone. The need and urge to read these old age scriptures led to the digital conversion of these by scanning the books. For scanning and converting the documents into editable