International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 08 | Aug 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2156
TEXT DETECTION AND RECOGNITION METHODS IN DIGITAL
IMAGES - A REVIEW
S.Keerthana
1
, Dr.A.Suphalakshmi
2
, and M.Revathi
3
1-3
Department of Computer Science and Engineering, Paavai Engineering College, Namakkal, India,
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - The emergence of machine learning and deep
learning models in the field of computer vision and pattern
recognition has improved the capability of Optical
Character Recognition (OCR) system to recognize texts that
are in arbitrary shapes, multi-orientation, and with complex
backgrounds in a natural scene image or video. This paper
describes the steps involved in OCR recognition, evaluation
protocols, and summarizes the recent researches done in the
field of text detection and text recognition.
Key Words: OCR, Text Detection, Text Recognition.
1. INTRODUCTION
Optical Character Recognition is a challenging field in
computer vision and pattern Recognition which is used for
digitalizing the text that is handwritten or printed within
any background [1]. Without OCR the task of digitalizing
the text would require recognition of text manually and
typing them for long period of time.
Though Recognition of text in a controlled environment
such as fixed layout, even illumination, simple background
and formats has achieved greater accuracy, the
recognition of text with complex layouts and backgrounds,
uneven illumination in natural scene of an image or video
is still a problem [2].
The applications of OCR are automatic number plate
recognition, passport verification, processing bank
cheques [1], digitalizing historical manuscripts, books,
handwritten or typed documents, etc. They are also used
in text-to-speech recognition system. OCRs are typically
customized to the field of application.
Factors that influences the performance of OCR are
1) Type of the text to be recognized (typed, printed
or handwritten text),
2) Text Background (simple, complex or natural
scene),
3) Format of text (Image or video),
4) Language of the text (unilingual, Bilingual or
Multilingual),
5) Mode of recognition (offline, online or real-time
recognition).
The paper is organized as follows: Section 2 describes the
steps involved in OCR, Section 3 defines the evaluation
protocols used to evaluate a method, Section 4 deals with
recent advancement in text detection, text recognition and
end-to-end recognition methods and Section 5 concludes
the paper.
2. STEPS INVOLVED IN OCR
The steps involved in OCR are
1) Data acquisition
2) Pre processing
3) Text Detection and Extraction
4) Text Enhancement
5) Text Segmentation
6) Text Recognition
7) Post processing
These steps are not delineated. They can be integrated and
accomplished by using a single technique.
2.1 Data Acquisition
The data is an image of handwritten or printed text with
simple, complex layouts or backgrounds in a natural scene
or document. The image of the text can be obtained by
1) Digital camera:
The portability, usability and size of the digital camera
make it flexible in acquiring real world text images.
However the image captured are low in resolution and has
uneven illumination causing blur [2].
2) Flatbed or handheld scanner:
Scanners are high in resolution, even and adequate in
illumination with minimal blur and have fast batch speed.
However the usability, size and portability make it difficult
to capture text in natural scene images [2].
3) Datasets:
There are wide varieties of database for text images are
available for researches. They are used in setting
benchmarks in terms of accuracy, processing speed and
storage. Some of the datasets are ICDAR datasets for text
recognition in video [3], multi-lingual scene text [4],
COCO-Text [5], etc. IIIT5K dataset [6] contains cropped
word images, Synth90k dataset [7] contains synthetic text,
CTW-1500 [8] contains curved text and Total Text [9]
contains text in arbitrary shape, SVT (Street View Text)
[10].