INTERNATIONAL RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY (IRJET) E-ISSN: 2395-0056 VOLUME: 06 ISSUE: 05 | MAY 2019 WWW.IRJET.NET P-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 8143 Real-Time Text Reader for English Language Dr. Jayshree R. Pansare 1 , Aditi Gaikwad 2 ,Vaishnavi Ankam 3 , Shikha Sharma 4 , Priyanka Karne 5 1,2,3,4,5 MES College of Engineering, SPPU, Pune, India ----------------------------------------------------------------------***--------------------------------------------------------------------- Abstract—Text classification is interesting task when it comes to classifying text from diverse sources such as images, videos, and handwritten text. Handwritten text may vary as per the varied user. Henceforth, it is tough to find the best technique to categorize such kind of texts due to the absence of standard dataset and evaluation measures. Our system presents a standard method for recognition and classifying the text from all kinds of above-mentioned input sources using the Optical Character Recognition (OCR) and Support Vector Machine (SVM) classifier. Initially it recognizes the text from image and classifies and places the text into predefined classes of parts of speech for English language. Key Words: — Real-Time Character Recognition, Optical Character Recognition (OCR), Support Vector Machine (SVM), Deep Learning algorithms. 1. Introduction Text classification is a informative tool especially for diverse language learners like English, Chinese, Kannada, French, and German. Images, videos and handwritten texts are the sources for text recognition. Mining of text from the running slide of video by using detection algorithm, localization method and extraction techniques give the 90.8% accuracy [1]. The text annotation time is reduced by 22% using segmentation for multimedia documents. [2]. For text recognition we need a huge amount of data of city name, state name, ZIP code for recognition of handwritten documents. There are about 3000 classes for city names and 42000 for ZIP codes discussed in [3]. The Multiwriter task and Writer independent task give the accuracy 49.1% for handwritten documents. After addition of SCFG based syntax analysis the accuracy recovers from 49.1% to 54.4% as per [4]. Recognition percentage is reached at 90% for English text recognition based on features combination of structural feature and statistical feature for many letters. The proposed method in [5] for English character recognition may be reduced external noise.The analysis of SVM classifier on diverse language is explained in [6]. The recognition accuracy is achieved 73.33% for Kannada and 96.13% for English lowercase alphabets using SVM classifier. Text recognition for human is not a difficult task but for machines is quite difficult. Following steps are needed for recognition text for machines. The steps are text acquisition, text identification, image to text transformation, character recognition. [7]. The convolutional neural network and genetic algorithm are explained in[8] for feature mining of text. The feed-forward network is used for classification purpose. The Name Entity Recognizer (NER) is process for identifying Location, noun, pronoun in the statement. The Deep Learning Algorithm achieved the 70% accuracy for Name Entity Recognizer [9]. The text extraction from comic images is not same as normal image. These both are different task. For comic image the dialogue balloon is identified first then the text are extracted from that balloon [10]. The paper is organized into following Sections: Section I emphasizes existing technique for Text recognition for Image,Video and Handwritten documents. Section II describes How to detect Text from Comic Image? The overall structure of the proposed method explained in Section III. In Section IV, we present the Comparative Performance of Real-Time Text Reader for English Language based on the performance of specific systems. We conclude with the conclusion in Section V. 2. Detection of Text from Image Image text recognition is specially separated into four stages viz.: detection, localization, extraction and recognition. The detection stage Identify the text regions. The localization stage finds the borders of strings. The extraction stage filters the background image. Fig. 2.1 portrays how image is transformed into binary image using detection, localization and extraction method. The recognition stage is used to recognized the text. 2.1 Text Detection: The primary stage in text recognition techniques is text detection. This method is classified into two different stages. In the first texture class the entire image is distributed into chunks. For this purpose numerous methodologies can be used, e.g. wavelet transforms, spatial variance, or Gabor filter. In second class the text block and non-text blocks are categorized. It may be done by using the neural network or support vector machine. To classify text and non-text block the background of an image should be clear. For background exclusion of image background-complexity-adaptive thresholding algorithm is used.