New Approach Based on Texture and Geometric Features for Text Detection Hinde Anoual 1 , Sanaa El Fkihi 1,2 , Abdelilah Jilbab 1,3 , and Driss Aboutajdine 1 1 LRIT, unit´ e associ´ ee au CNRST, FSR, Mohammed V University Agdal, Morocco hindanoual@yahoo.fr, aboutaj@fsr.ac.ma 2 ENSIAS, Mohammed V University Soussi, BP 713, Rabat, Morocco elfkihi@ensias.ma 3 ENSET, Madinat AL Irfane, B.P 6207 Rabat-Instituts, Rabat, Morocco jilbab@enset-rabat.ac.ma Abstract. Due to the huge amount of data carried by images, it is very important to detect and identify the text region as accurately as possible before performing any character recognition. In this paper we describe a text detection algorithm in complex background. It is based on tex- ture and connected components analysis. First we abstract texture re- gions which usually contain text. Second, we segment the texture regions into suitable objects; the image is segmented into three classes. Finally, we analyze all connected components present in each binary image ac- cording to the three classes with the aim to remove non-text regions. Experiments on a benchmark database show the advantages of the new proposed method compared to another one. Especially, our method is insensitive to complex background, font size and color; and offers high precision (83%) and recall(73%) as well. Keywords: Text detection, text localization, feature extraction, texture analysis, geometric analysis. 1 Introduction Text detection is defined as the task that localizes text in complex background without recognizing individual characters. It is still an interesting research topic in many fields. The rational behind this is the fact that, in a given image, the embedded texts are considered as reliable sources of descriptive information since they carry important information on the semantics of the image content. How- ever, there are mainly three text detection challenges: complex background, size of characters, and multiple colors/font styles. To face these challenges, extensive efforts have been made to extract text from images. The existing approaches are based on the mainstream text characteristics that are: – Texture characteristics: Text region is considered as a texture region to iso- late from the rest of the image. There are many kinds of text texture char- acteristics such as contrast and color homogeneity. Indeed, text must be readable, and that is why the contrast of text is important and highest com- pared to other objects. Also, characters tend to have the same or similar A. Elmoataz et al. (Eds.): ICISP 2010, LNCS 6134, pp. 157–164, 2010. c Springer-Verlag Berlin Heidelberg 2010