Bayesian Approach to Photo Time-Stamp Recognition Asif Shahab, Faisal Shafait, Andreas Dengel German Research Centre for Artiﬁcial Intelligence (DFKI) Kaiserslautern, Germany {Asif.Shahab,Faisal.Shafait,Andreas.Dengel}@dfki.de Abstract—Time-stamps and URLs overlaid artiﬁcially on images add useful meta information which can be used for automatic indexing of images and videos. In this paper, we propose a method based on an attention-based model of visual saliency to extract overlaid text and time-stamps that are rendered on images. Our model of visual saliency is based on a Bayesian framework and works very well for the task of time- stamp detection and segmentation as is evident by overall object recall of 80% and precision of 70%. Our method produces a clean text segmented binarized image, which can be used for recognition directly by an OCR system. Furthermore, our technique is robust against variation of font styles and color of time-stamp and overlaid text. Keywords-overlaid text detection/recognition, photo time- stamp detection/recognition, visual saliency, Bayesian model for text detection I. I NTRODUCTION Extraction of text occurring naturally (Scene text) or artiﬁcially (Overlaid text) in an image has been the focus of research for many years. Scene text occurs naturally in an image and is difﬁcult to separate because of illumination problems, perspective distortions and occlusion. Overlaid or artiﬁcial text is usually added by cameras in form of time-stamps and URLs or by image editing software on top of the image. They are usually upright and are added with readability in mind. Though time-stamps are usually put at a distinct location, in distinct color and font, the problem of extracting these time-stamps from images is still a complex one. Firstly, because it can appear on highly textured backgrounds making it difﬁcult to separate from the background. Secondly, the high dimensionality of colored images and the variety of fonts, colors and formats time- stamps can occur in an image makes it a challenging problem. Several approaches have been reported in literature to solve the problem of text extraction from images. These approaches can be classiﬁed into region based and texture based methods [1]. Region based methods use connected components or ver- tical and horizontal edges and merge them together based on some rules exploiting the geometric properties of text. These methods work on the assumption that color of text does not change and is considerably different from the background color [2] [3] [4]. They are generally faster and work well for simple backgrounds but they are sensitive to noise [5]. Texture based methods use the textural properties to sepa- rate text from the background. In order to extract the textural features they use range of frequency domain techniques like Gabor ﬁlters, FFT, DCT, Wavelets, spatial variance etc. Subsequently, they use machine learning algorithms such as SVM, AdaBoost and MLP to train a text ﬁnder [6] [7]. Pan et al. [8] recently proposed a hybrid system for scene text detection which uses a combination of texture and connected component based method and uses a CRF model to ﬁlter non-text components. These algorithms are generally slow because of high computational complexity. Li et al. [9] [10] and Chen et al. [11] proposed systems for time-stamp detection and recognition based on time-stamp fonts template and skeleton matching. Set of templates are created for a variety of time-stamps and Sobel operators on Red and Green color channels and set of morphological operators (close, open) are applied for rough segmentation of image and reduce the search space for skeleton matching. The major limitation of the system is that of number of templates required to accommodate all variety of fonts and styles time-stamp can occur in. Further, the red and green channels used by Li et al. for the extraction of sobel edges limits the system capabilities to work with range of colors time-stamps can take. In this paper we propose a system for overlaid text ex- traction based on attention based models of visual saliency. Since overlaid text and time-stamps are usually added with the intention of readability and thus respond well to atten- tion based models. We have applied Bayesian framework tuned by time-stamps location based Bayesian prior learned independently from training images to calculate saliency for each pixels. We explain our technique for saliency evaluation and time-stamps segmentation in Section 2. We report our ex- perimental results in Section 3 and conclude the paper in Section 4. II. PROPOSED METHOD Our probabilistic framework for time-stamps detection is inspired by the visual saliency model for object search and contextual guidance by Torralba et al. [12]. Such a model gives for each image location, the probability of ﬁnding an object, in our case time-stamps, by integrating global and local image information using task constraints. 2011 International Conference on Document Analysis and Recognition 1520-5363/11 $26.00 © 2011 IEEE DOI 10.1109/ICDAR.2011.210 1039