Ashima Godha et al., International Journal of Information Systems and Computer Sciences, 8(2), March - April 2019, 1– 6 1 Comparative Study on Text Detection and Recognition from Lecture Videos Ashima Godha 1 Rahul Sharma 2 1 M.Tech (CTA) RKDF School of Engineering, Indore ashimagodha@maill.com 2 Assistant Professor (CSE) RKDF School of Engineering, Indore Sharma.rahul5656@gmail.com ABSTRACT Lecture videos are rich with textual information and to be able to understand the text is quite useful for larger video understanding/analysis applications. Textual information extracted from these sources can be used for automatic image and video indexing, and image structuring. But, due to variations in text style, size, alignment of text, as well as orientation of text and low contrast of the image and complex background make challenging the extraction of text. From the past recent years, many methods for extraction of text are proposed. This paper provides with analysis, comparison of performance of various methods used for extraction of text information from images. It summarizes various methods for text extraction and various factors affecting the performance of these methods. Key words: Word recognition, Lecture video, Text extraction, Text localization, Text segmentation, Connected component ⋅ Edge-based approach. 1. INTRODUCTION Text in images comprises of valuable statistics and is exploited in many applications that uses image and video applications, such as content-based web image search, video information retrieval, and mobile based text analysis and text recognition [1-5]. Due to composite background, and deviations of font, size, color and orientation, text in natural scene images has to be vigorously detected before being recognized and regained. With increasing interest in e-learning in the form of OpenCourseWare (OCW) lectures and Massive Open Online Courses (MOOCs), freely available lecture videos are abundant. Understanding lecture videos is critical for educational research, particularly in the context of MOOCs which has become synonymous with distance learning. For example a lecture video can be analyzed to understand a teacher’s engagement with the learners, on which frames does the viewers pay more attention [1] etc. The figures, images and text in lecture videos are vital cues for understanding any lecture video. Text is present almost everywhere in a lecture video; particularly in lectures on Science, Mathematics and Engineering. Text alone could be used for a variety of tasks like keyword generation, video indexing and enabling search and extracting class notes [2]– [5]. Text in lecture videos comprise of handwritten text written on a blackboard or a paper, text written using a stylus on a tablet and displayed on a screen or font rendered text appearing in presentation slides (digital text). Lectures are recorded using one or more cameras, and the camera(s) are typically positioned to directly face the blackboard or the presentation slides. Usually text recognition from presentation slides is less challenging as the text is more legible, there is little variation in style and there is more contrast. At the same time text on blackboard is handwritten and not very legible due to poor lighting, smaller size or poor contrast. On blackboard or on paper the lecturer may write over figures and equations, and this makes the scene cluttered, making it harder to detect the text. Figure 1 shows few samples. Figure 1: Visualization of text localization and recognition results on frames from the Lecture Videos 2. STEPS OF TEXT EXTRACTION The text extraction problem is divided into following steps [1]: i. Text detection ii. Text localization ISSN 2319 – 7595 Volume 8, No.2, March - April 2019 International Journal of Information Systems and Computer Sciences Available Online at http://warse.org/IJISCS/static/pdf/file/ijiscs01822019.pdf https://doi.org/10.30534/ijiscs/2019/01822019