Scene Text Detection using Sparse Stroke Information and MLP Aruni Roy Chowdhury 1 , Ujjwal Bhattacharya 2 , Swapan K. Parui 2 1 Department of Information Technology, Heritage Institute of Technology, Kolkata, India 2 Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata, India 1 arunirc@gmail.com, 2 {ujjwal, swapan}@isical.ac.in Abstract In this article, we present a novel set of features for detection of text in images of natural scenes using a multi-layer perceptron (MLP) classifier. An estimate of the uniformity in stroke thickness is one of our features and we obtain the same using only a subset of the distance transform values of the concerned region. Estimation of the uniformity in stroke thickness on the basis of sparse sampling of the distance transform values is a novel approach. Another feature is the distance between the foreground and background colors computed in a perceptually uniform and illumination-invariant color space. Remaining features include two ratios of anti-parallel edge gradient orientations, a regularity measure between the skeletal representation and Canny edgemap of the object, average edge gradient magnitude, variation in the foreground gray levels and five others. Here, we present the results of the proposed approach on the ICDAR 2003 database and another database of scene images consisting of text of Indian scripts. 1. Introduction Detection of text in natural scene images is a challenging problem given the wide variations possible. Research on this problem has received a recent thrust due to its significant application potential. Existing approaches to text detection can be broadly categorized into connected component (CC) based and texture based algorithms. The CC based methods are relatively simple, but often they fail to be robust. On the other hand, although texture-based algorithms are robust, they usually have higher computational complexities. The stroke width feature has been used in several earlier works [1,2,3,4,5] on detection of scene text. Epshtein et al.[4] estimated this stroke width by dense calculation of “Stroke Width Transform” in a bottom- up approach from pixel level. In [5], the same was calculated using the distance transform (DT) of certain Maximally Stable Extremal Regions (MSER). 2. Proposed scheme An extensive study of scene text shows that (i) strokes of such text are bounded curves having either uniform or nearly-uniform thickness, (ii) the color of a piece of text is nearly uniform, (iii) the color of text is perceptually distinct from the color of its background. In the proposed approach to detection of text in scene images, we consider the above characteristics in designing the feature vector for the multi-layer perceptron (MLP) classifier. 2.1. Preprocessing The input color image is converted into a grayscale image (I) before it is subjected to Gaussian smoothing with a 3 × 3 kernel. Next, its Canny edgemap (E) is obtained. We perform edge-linking on E as follows. If the number of object pixels in the 8-neighborhood of an object (edge) pixel of E is only 1, then the said pixel of E is the end-point of a broken edge. With each such end pixel as the center we search within a 5 × 5 neighborhood and if any other edge pixel is found, then these are joined. We obtain connected components (CC) of E after edge-linking. The coordinates of the bounding box of each connected component of E are stored in a set <P>. 2.2. A set of geometrical and topological features Existing approaches for detection of text in scene images usually employ a number of empirically designed criteria [2, 4, 5] to remove majority of non- text components from such images. A similar set of criteria was used here based on an extensive study of the training samples of the ICDAR 2003 database. A