International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 08 | Aug 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2156 TEXT DETECTION AND RECOGNITION METHODS IN DIGITAL IMAGES - A REVIEW S.Keerthana 1 , Dr.A.Suphalakshmi 2 , and M.Revathi 3 1-3 Department of Computer Science and Engineering, Paavai Engineering College, Namakkal, India, ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - The emergence of machine learning and deep learning models in the field of computer vision and pattern recognition has improved the capability of Optical Character Recognition (OCR) system to recognize texts that are in arbitrary shapes, multi-orientation, and with complex backgrounds in a natural scene image or video. This paper describes the steps involved in OCR recognition, evaluation protocols, and summarizes the recent researches done in the field of text detection and text recognition. Key Words: OCR, Text Detection, Text Recognition. 1. INTRODUCTION Optical Character Recognition is a challenging field in computer vision and pattern Recognition which is used for digitalizing the text that is handwritten or printed within any background [1]. Without OCR the task of digitalizing the text would require recognition of text manually and typing them for long period of time. Though Recognition of text in a controlled environment such as fixed layout, even illumination, simple background and formats has achieved greater accuracy, the recognition of text with complex layouts and backgrounds, uneven illumination in natural scene of an image or video is still a problem [2]. The applications of OCR are automatic number plate recognition, passport verification, processing bank cheques [1], digitalizing historical manuscripts, books, handwritten or typed documents, etc. They are also used in text-to-speech recognition system. OCRs are typically customized to the field of application. Factors that influences the performance of OCR are 1) Type of the text to be recognized (typed, printed or handwritten text), 2) Text Background (simple, complex or natural scene), 3) Format of text (Image or video), 4) Language of the text (unilingual, Bilingual or Multilingual), 5) Mode of recognition (offline, online or real-time recognition). The paper is organized as follows: Section 2 describes the steps involved in OCR, Section 3 defines the evaluation protocols used to evaluate a method, Section 4 deals with recent advancement in text detection, text recognition and end-to-end recognition methods and Section 5 concludes the paper. 2. STEPS INVOLVED IN OCR The steps involved in OCR are 1) Data acquisition 2) Pre processing 3) Text Detection and Extraction 4) Text Enhancement 5) Text Segmentation 6) Text Recognition 7) Post processing These steps are not delineated. They can be integrated and accomplished by using a single technique. 2.1 Data Acquisition The data is an image of handwritten or printed text with simple, complex layouts or backgrounds in a natural scene or document. The image of the text can be obtained by 1) Digital camera: The portability, usability and size of the digital camera make it flexible in acquiring real world text images. However the image captured are low in resolution and has uneven illumination causing blur [2]. 2) Flatbed or handheld scanner: Scanners are high in resolution, even and adequate in illumination with minimal blur and have fast batch speed. However the usability, size and portability make it difficult to capture text in natural scene images [2]. 3) Datasets: There are wide varieties of database for text images are available for researches. They are used in setting benchmarks in terms of accuracy, processing speed and storage. Some of the datasets are ICDAR datasets for text recognition in video [3], multi-lingual scene text [4], COCO-Text [5], etc. IIIT5K dataset [6] contains cropped word images, Synth90k dataset [7] contains synthetic text, CTW-1500 [8] contains curved text and Total Text [9] contains text in arbitrary shape, SVT (Street View Text) [10].