AbstractIn this paper we present a combined binarization technique for historical document images. Usually, many binarization techniques are implemented in the literature for different types of binarization problems. The few simple available thresholding methods cannot be applied to many binarization problems. In order to improve the quality of historical document images, we propose a combined approach based on global and local thresholding methods. The method was evaluated on the benchmarking dataset used in the Handwritten Document Image Binarization Contest (H-DIBCO 2012) and an Arabic historical document from National Library of Algeria. The evaluation based on the word spotting system showed the efficiently of our approach. Index TermsHistorical document, binarization, global and local threshold, word spotting. I. INTRODUCTION Binarization is an important step in historical document image preprocessing to eliminate background noise and improve the document quality. This process consists of converting the gray-level image in binary image which can be used for further processing (Optical character recognition „OCR‟, Intelligent character recognition „ICR‟, Word spotting…). Many thresholding algorithms have been previously proposed. However, the quality of these algorithms still shows quality shortcoming in document image analysis systems. An early histogram-based global binarization algorithm, Otsu‟s method [1], is widely used. Isodata‟s method [2] also used as a global method to calculate the explicit thresholds. Niblack [3], Sauvola [4] and NICK [5] use a local thresholding. It has been proved that all previously reported methods are effective for certain types of document images. However, none has been proved to be effective for all examples of degraded document images. Historical document images are particularly challenging for the thresholding or information separation problem (Document Image Binarization Contest: DIBCO 2009 [6], H-DIBCO 2010 [7], DIBCO 2011 [8] and H-DIBCO 2012 [9]). Many historical documents have become degraded and are difficult for a human to decipher due to long ineffective storage conditions and inevitable differences in paper quality and Manuscript received September 20, 2013; revised November 21, 2013. ET-Tahir Zemouri is with the Speech Communication and Signal Processing Laboratory, University of Sciences and Technology Houari Boumediene, Algiers, Algeria (e-mail: tzemouri@ usthb.dz). Youcef Chibani and Youcef Brik are with the Speech Communication and Signal Processing Laboratory, University of Sciences and Technology Houari Boumediene, Algiers, Algeria (e-mail: ychibani@ usthb.dz, ybrik@ usthb.dz). ink. In order to improve the quality of binarized image, we propose to enhance them before the binarization. The proposed method makes use of the global thresholding to enhance the document image, and then we apply a local thresholding method. This paper is structured as follow; Section II reviews the state of the art of binarization techniques. Our proposed method is presented in Section III. Then, experimental results are reported in Section IV. Finally, conclusion and future work are presented in Section V. II. STATE OF THE ART Thresholding historical document image converts the gray-level image to binary format by separating the useful font and information from the background. There are two main approaches of binarization namely global and local thresholding. In the global method, only one threshold is used in the whole image, if the pixel value of an input image is more than T, the pixel is set to background. Otherwise, it is foreground. Otsu‟s method [1] assumes the presence of two distributions (one for the text and another one for the background). It calculates a threshold value in such a way that it maximizes the variance between the two distributions. Isodata‟s method [2] calculates a threshold by separating iteratively the gray-level histogram into two classes. The main drawback of global methods is that they can‟t adapt well to uneven illumination and noise. Hence, they do not perform well on low quality document images. Unlike global thresholding, local threshold is calculated for each pixel in the image according to the properties of its neighborhood. This method generally performs better for low quality images. Niblack‟s method [3] calculates the thresholding values of each window over the image separately by the following formula: T = m + k.s (1) where m is the mean value and s is the standard deviation value of the pixels inside the window. The value of k is generally fixed to -0.2. Sauvola‟s method [4] was developed from Niblack‟s method. It aims to solve the problem of black noise depending on the impact on the standard deviation value by using a range of gray-level values in the images. The thresholding formula is: Enhancement of Historical Document Images by Combining Global and Local Binarization Technique E. Zemouri, Y. Chibani, and Y. Brik International Journal of Information and Electronics Engineering, Vol. 4, No. 1, January 2014 1 DOI: 10.7763/IJIEE.2014.V4.397