IMAGE REPLICA DETECTION USING R-TREES AND LINEAR DISCRIMINANT ANALYSIS Spyros Nikolopoulos, Stefanos Zafeiriou, Panagiotis Sidiropoulos, Nikos Nikolaidis and Ioannis Pitas Dept. of Informatics, Aristotle University of Thessaloniki, Box 451, 54124 Thessaloniki, Greece e-mail: {nikolaid,pitas}@aiia.csd.auth.gr ABSTRACT In this paper a novel system for image replica detection is presented. The system uses color-based descriptors in order to extract robust features for image representation. These fea- tures are used for indexing the images in a database using an R-Tree. When a query about whether a test image is a replica of an image in the database is submitted, the R-Tree is tra- versed and a set of candidate images is retrieved. Then, in order to obtain a single result and at the same time reduce the number of decision errors the system is enhanced with Linear Discriminant Analysis (LDA). The conducted experi- ments show that the proposed approach is very promising. 1. INTRODUCTION Recent technological advances in the area of multimedia con- tent distribution have resulted in a major reorganization of this trade. Valuable digital artworks can be reproduced and distributed arbitrarily without any control by the copyright holders. Thus, issues related to intellectual property rights protection arise. Numerous systems addressing the issue of copyright pro- tection can be found in the literature, the vast majority of them being based on watermarking. Watermarking is the technique of imperceptibly embedding information within the content of the original image [1]. Although watermarking has attracted considerable interest from both industry and academia, it bears certain deciencies. The requirement of embedding informa- tion inside a digital image before it is made publicly avail- able, automatically excludes digital images that are already in the public domain and need to be copyright protected. In addition, watermarking is unable to counter content leakages, when an unwatermarked copy of the original artwork is stolen. In order to overcome these inherent watermarking de- ciencies, the scientic community recently started to investi- gate image copyright protection and Digital rights Manage- ment from another perspective. Specically, the problem is envisaged as an image similarity one where the system de- cides if a query image resembles a reference image (i.e. it is a This work has been partly supported by EU and Greek national funds under Operational Programme in Education and Initial Vocational Training II through the Archimedes project ”Retrieval and Intellectual Property rights protection of Multidimensional Digital Signals” (04-3-001/4). replica of this image). These systems are referred to as image replica recognition/detection systems [2, 3, 4, 5]. Replica de- tection is the process of identifying all images that have been generated from the original version through intentional or un- intentional manipulations. It is assumed that the modied im- ages maintain sufcient visual quality in order to keep their commercial value and also maintain the semantic content of the original. Severely distorted copies are of no interest for a replica detection system. The major benet of this approach derives from the fact that no additional information should be embedded within the image content, thus making the system applicable to images that are already in the public domain. In this paper an image replica detection system that uti- lizes a database of original images that can be queried with a suspect image and decide whether this image is a replica of a stored original is proposed. Images are represented by a fea- ture vector comprising of color-based descriptors [6]. Then, we implement a multidimensional indexing structure based on R-Trees. Although substantially reduced, the probability that the R-Tree returns more than a single image as candidates for being the originals of the query is existing and prevents the system to decide unambiguously. We introduce the use of image-class information for resolving cases unsuccessfully handled by the indexing structure. Specically, LDA (pre- ceded by PCA) is applied in order to reformulate the solution space and yield more discriminant image representations. 2. REPLICA DETECTION SYSTEM 2.1. System Overview The process of engineering the proposed image replica detec- tion system can be separated to two independent phases. The rst phase deals with the database organization and construc- tion. Each time a new original copyright protected image is added into the database, the image is subjected to a series of predened attacks (image manipulations) selected according to the system’s design specications. Feature vectors are ex- tracted from each attacked version resulting in a feature table which contains samples from the feature space neighborhood of the original image. The latter is utilized for the calculation of an extent vector that species the neighborhood extent for each dimension. Finally, the original image is indexed within 1797 1424403677/06/$20.00 ©2006 IEEE ICME 2006