Bayesian Approach to Photo Time-Stamp Recognition
Asif Shahab, Faisal Shafait, Andreas Dengel
German Research Centre for Artificial Intelligence (DFKI)
Kaiserslautern, Germany
{Asif.Shahab,Faisal.Shafait,Andreas.Dengel}@dfki.de
Abstract—Time-stamps and URLs overlaid artificially on
images add useful meta information which can be used for
automatic indexing of images and videos. In this paper, we
propose a method based on an attention-based model of visual
saliency to extract overlaid text and time-stamps that are
rendered on images. Our model of visual saliency is based on a
Bayesian framework and works very well for the task of time-
stamp detection and segmentation as is evident by overall object
recall of 80% and precision of 70%. Our method produces
a clean text segmented binarized image, which can be used
for recognition directly by an OCR system. Furthermore, our
technique is robust against variation of font styles and color
of time-stamp and overlaid text.
Keywords-overlaid text detection/recognition, photo time-
stamp detection/recognition, visual saliency, Bayesian model
for text detection
I. I NTRODUCTION
Extraction of text occurring naturally (Scene text) or
artificially (Overlaid text) in an image has been the focus
of research for many years. Scene text occurs naturally in
an image and is difficult to separate because of illumination
problems, perspective distortions and occlusion. Overlaid
or artificial text is usually added by cameras in form of
time-stamps and URLs or by image editing software on
top of the image. They are usually upright and are added
with readability in mind. Though time-stamps are usually
put at a distinct location, in distinct color and font, the
problem of extracting these time-stamps from images is still
a complex one. Firstly, because it can appear on highly
textured backgrounds making it difficult to separate from the
background. Secondly, the high dimensionality of colored
images and the variety of fonts, colors and formats time-
stamps can occur in an image makes it a challenging
problem.
Several approaches have been reported in literature to
solve the problem of text extraction from images. These
approaches can be classified into region based and texture
based methods [1].
Region based methods use connected components or ver-
tical and horizontal edges and merge them together based on
some rules exploiting the geometric properties of text. These
methods work on the assumption that color of text does not
change and is considerably different from the background
color [2] [3] [4]. They are generally faster and work well
for simple backgrounds but they are sensitive to noise [5].
Texture based methods use the textural properties to sepa-
rate text from the background. In order to extract the textural
features they use range of frequency domain techniques
like Gabor filters, FFT, DCT, Wavelets, spatial variance etc.
Subsequently, they use machine learning algorithms such as
SVM, AdaBoost and MLP to train a text finder [6] [7]. Pan
et al. [8] recently proposed a hybrid system for scene text
detection which uses a combination of texture and connected
component based method and uses a CRF model to filter
non-text components. These algorithms are generally slow
because of high computational complexity.
Li et al. [9] [10] and Chen et al. [11] proposed systems for
time-stamp detection and recognition based on time-stamp
fonts template and skeleton matching. Set of templates are
created for a variety of time-stamps and Sobel operators
on Red and Green color channels and set of morphological
operators (close, open) are applied for rough segmentation of
image and reduce the search space for skeleton matching.
The major limitation of the system is that of number of
templates required to accommodate all variety of fonts and
styles time-stamp can occur in. Further, the red and green
channels used by Li et al. for the extraction of sobel edges
limits the system capabilities to work with range of colors
time-stamps can take.
In this paper we propose a system for overlaid text ex-
traction based on attention based models of visual saliency.
Since overlaid text and time-stamps are usually added with
the intention of readability and thus respond well to atten-
tion based models. We have applied Bayesian framework
tuned by time-stamps location based Bayesian prior learned
independently from training images to calculate saliency for
each pixels.
We explain our technique for saliency evaluation and
time-stamps segmentation in Section 2. We report our ex-
perimental results in Section 3 and conclude the paper in
Section 4.
II. PROPOSED METHOD
Our probabilistic framework for time-stamps detection
is inspired by the visual saliency model for object search
and contextual guidance by Torralba et al. [12]. Such a
model gives for each image location, the probability of
finding an object, in our case time-stamps, by integrating
global and local image information using task constraints.
2011 International Conference on Document Analysis and Recognition
1520-5363/11 $26.00 © 2011 IEEE
DOI 10.1109/ICDAR.2011.210
1039