ICFHR 2010 Contest: Quantitative Evaluation of Binarization Algorithms Roberto Paredes PRHLT - Universidad Politecnica de Valencia, Spain rparedes@dsic.upv.es Ergina Kavallieratou ICSD - University of the Aegean, Greece kavallieratou@aegean.gr Rafael Dueire Lins Universidade Federal de Pernambuco, Brasil rdl@ufpe.br Abstract—This paper describes the ICFHR 2010 Contest for quantitative evaluation of binarization algorithms. These algo- rithm are applied to synthetic images of modern pdf documents with noise from historical documents. Today, many scientists work on the binarization task and many algorithms have been proposed. However, the selection of the most appropriate one is not a simple procedure. The evaluation of these algorithms proved to be another difficult task since there is no objective way to compare the results. Here, 4 groups with 6 systems are participating in the competition. The experimental setting is described in detail. Moreover, a short description of the participating groups, their systems, and the results achieved are finally presented. Keywords-Document Binarization I. I NTRODUCTION Document binarization is a preprocessing task, very useful to document analysis systems. It automatically converts the document images in a bi-level form in such way that the foreground information is represented by black pixels and the background by white ones. This simple procedure has been proved to be a very difficult task, especially in the case of historical documents that very specialized problems have to be dealt with, such as variation in contrast and illumination, smearing and smudging of text, seeping of ink to the other side of the page and general degradation of the paper and ink due to aging. On the other hand, such a task is necessary for the further stages of document analysis either we are interested in performing OCR, or document segmentation, or just presentation of the document after some restoration stages. The remaining noise, due to bad binarization, would reduce the performance of the forthcoming processing steps and in many cases could even cause their failure. Many algorithms have been proposed for the document binarization task. However, the selection of the most appropriate one is not a simple procedure. The evaluation of these algorithms proved to be another difficult task since there is no objective way to compare the results. Weszka and Rosenfeld [13] defined several evaluation criteria. Palumbo et al. [10] addressed the issue of document binarization comparing three methods. Sahoo et al. [11] surveyed nine thresholding algorithms and illustrated comparatively their performance. Lee et al. [4] conducted a comparative analysis of five global thresholding methods. Glasbey [2] pointed out the relationships and performance differences between histogram-based algorithms based on an extensive statistical study. Leedham et al. [5] compared five binarization algorithms by using the precision and recall analysis of the resultant words in the foreground. He et al. [3] compared six algorithms by evaluating their effect on end-to-end word recognition performance in a complete archive document recognition system using a commercial OCR engine. Sezgin and Sankur [12] described 40 thresholding algorithms and categorized them according to the information content used. They measured and ranked their performance comparatively in two different contexts of images. All the above mentioned works presented some very interesting conclusions. However, the problem is that in every case, they try to use results from ensuing tasks of document processing hierarchy, in order to survey the algorithm performance. Although in many cases this is the objective goal, it is not always possible and it is an indirect evaluation approach (through subsequent analysis stages). In case of historical documents where their quality in many cases obstructs the recognition, and sometimes even the word segmentation, this way of evaluation can be proved problematic. On the other hand, we need a different evaluation technique, more direct, able to evaluate just the binarization stage. The ideal way of evaluation should be able to decide, for each pixel, if it has finally succeeded the right color (black or white) after the binarization. This is an easy task for a human observer but very difficult for a computer to perform it automatically for all the pixels of several images. The methodology, used in the contest, includes the ex- perimentation on document archives made by constructing