A Ground Truth Bleed-Through Document Image Database R´ ois´ ın Rowley-Brooke, Fran¸ cois Piti´ e, and Anil Kokaram Department of Electronic and Electrical Engineering, Trinity College Dublin, Ireland {rowleybr,fpitie,anil.kokaram}@tcd.ie Abstract. This paper introduces a new database of 25 recto/verso im- age pairs from documents suffering from bleed-through degradation, to- gether with manually created foreground text masks. The structure and creation of the database is described, and three bleed-through restoration methods are compared in two ways; visually, and quantitatively using the ground truth masks. Keywords: Document database, bleed-through, document restoration 1 Introduction Bleed-through degradation poses one of the most difficult problems in docu- ment restoration. It occurs where ink has seeped through from one side of the page and interferes with text on the other side. There have been many pro- posed solutions to the bleed-through problem, and it is clear that researchers working in the area of bleed-through restoration are faced with two main chal- lenges. Firstly, it can be difficult to obtain access to high resolution degraded images unless connected with a specific library or digitisation project. Secondly, for all document restoration techniques, problems arise when trying to analyse results quantitatively, as there is no actual ground truth available. This prob- lem may be overcome either by creating synthetic degraded images with known ground truth, [5],[16], or by creating synthetic ground truth data for given real degraded images, [2]. Alternatively, performance may be evaluated without any ground truth by quantifying how the restoration affects a secondary step, such as the performance of an Optical Character Recognition (OCR) system on the document image, [17],[16]. A further issue with quantitative evaluations for per- formance comparison is that results of different methods are often in different formats, such as binary images [1], pseudo-binary images where the background is uniform with varying foreground intensities [10],[8], or a textured background medium with varying foreground and background intensities [17],[11]. We pro- pose that a fair quantitative comparison between methods can only be achieved if they are converted to the same format then compared to a ground truth that is also of the same format, and the simplest way of achieving this is to binarise all the results and compare them to a binary ground truth. To our knowledge there