A New Skew Estimation Technique for Cone Shaped Text Line G. Hemantha Kumar, *P. Shivakumara, J.Vidyamba, H.S.Varsha, S.Rekha, M.R. Rashmi Nayaka Department of Studies in Computer Science, University of Mysore, Mysore – 570006, Karnataka, *Email: hudempsk@yahoo.com Abstract There are printed documents where text lines in a single page may not be parallel to each other. These text lines may have different orientations or text lines may be arc or cone or curved shape. For the optical character recognition of these documents, we need to detect skewed angle properly. In this paper we present a new technique to estimate skew angles of cone shaped text lines. The method divides the whole text line image into number of divisions. The width of division is defined by 4 columns. The method extracts the coordinates of top pixel, bottom pixel and mid pixel from each division of text line. These coordinates are subjected to linear regression analysis separately to estimate skew angles. The touching point of two text lines is identified based on study of continuous and consistent skew angles obtained by the method. The continuous and consistent skew angles are obtained by appending the number of divisions from left to right. Based on touching point the method identifies two different skew angles for two different oriented touching lines. The cone shaped text line is corrected into single horizontal line using two different skew angles. The experimental results shows that the method works satisfactorily. Keywords: Cone shaped text lines, Number of divisions, Linear regression analysis, Skew estimation. 1. Introduction In this chapter we have presented the preprocessing step, essential for document image analysis. The major preprocessing steps include noise removal, skew detection, image binarization, identifying and smoothing the disconnected characters and overcoming image degradations. Among these preprocessing steps skew detection is considered to be very important in document analysis. More important is in document image mosaicing [6-10] since skew is unavoidable during scanning the documents. Thus, we have taken up skew detection as a major work in this research. 2. Related Literature The methods to extract multi-oriented text lines in a single document when the text lines in a document are parallel to one another are cited in the literature. A simple algorithm using global projection profile is good enough to identify skew lines. However, this approach fails when text line with different orientations are present in a document. The modification of the global projection profile method is the Dosctrum [1,2] method, which works based on the nearest neighbor technique. It also extracts text lines with single orientation. But, the method fails when the text lines are not parallel to each other. (Goto and Asu, 1999) have proposed a technique [5] to identify text lines with different orientations where document image is split into certain constant width sub-regions chosen randomly, and in each sub-region the local orientations are estimated. They have tried to discriminate between the number of possible orientations and the estimation of linking the extended linear segment in the sub-regions. This method does not handle variable sized text documents. (Fletcher and Kasturi, 1998) have proposed a method, which extracts text lines with arbitrary orientations from regions, containing both text and graphics [4]. The techniques of bounding box around the characters and the Hough transform are used in the method. The method does not handle text lines of arbitrary size because of the assumption that the average height of the largest character is not greater than five times the average height of the smallest character. (Pal et. al., 20002) have proposed a robust technique, to extract English text lines of arbitrary orientations with characters having variable sizes and styles, and detects the skew of individual text lines. Here individual text lines are assumed to be straight with arbitrary orientation [3]. The method has certain drawbacks namely a) the method fails to extract text lines other than the English text, b) the method fails to extract arc, round and zigzag shaped text lines, c) the method cannot handle mixed text and graphics documents automatically and d) it cannot extract lines properly if the total number of components in the text line is two or less.