Content Based Copy Detection with Coarse Audio-Visual Fingerprints Ahmet Saracoğlu 1,2 , Ersin Esen 1,2 , Tuğrul K. Ateş 1,2 , Banu Oskay Acar 1 , Ünal Zubari 1 , Ezgi C. Ozan 1,2 , Egemen Özalp 1 A. Aydın Alatan 2 , Tolga Çiloğlu 2 1 TÜBİTAK Space Technologies Research Institute 2 Department of Electrical and Electronics Engineering, M.E.T.U. {ahmet.saracoglu,ersin.esen,tugrul.ates,banu.oskay,unal.zubari,ezgican.ozan, egemen.ozalp}@uzay.tubitak.gov.tr {alatan,ciloglu}@eee.metu.edu.tr Abstract Content Based Copy Detection (CBCD) emerges as a viable choice against active detection methodology of watermarking. The very first reason is that the media already under circulation cannot be marked and secondly, CBCD inherently can endure various severe attacks, which watermarking cannot. Although in general, media content is handled independently as visual and audio in this work both information sources are utilized in a unified framework, in which coarse representation of fundamental features are employed. From the copy detection perspective, number of attacks on audio content is limited with respect to visual case. Therefore audio, if present, is an indispensable part of a robust video copy detection system. In this study, the validity of this statement is presented through various experiments on a large data set. 1. Introduction Content Based Copy Detection (CBCD) is an emerging and active research area due to various improvements witnessed in multimedia and communication technologies, such as adoption of more efficient multimedia coding standards and astounding increase in data transfer rates. These improvements and many more generated an even more catalyzing force: “video hosting service”. YouTube, Google Video, Metacafe and similar services are parts of our daily lives. As the amount of digital media in these sources increase exponentially (in August 2006 YouTube was hosting about 6.1 million videos [1] and as of April 2008, a YouTube search returns about 83.4 million videos [2]) two crucial and unavoidable problems arise; management of the copyrights and numerous duplicates. For the solution of these problems there are two main approaches; passive methods and active methods i.e.: watermarking. However, watermarking has two significant limitations. First, since watermarks must be introduced into the original content before copies/duplicates are made, it cannot be applied to content which is already in circulation. Second, the degree of robustness is not adequate for some of the attacks that we encounter frequently. Passive detection methods, on the other hand, try to directly detect copyright infringements and duplicate videos by comparing questioned data against a database. This approach can be thought as a complementary technology to watermarking which provides a solution to the two problems mentioned above. The primary idea of this approach can be interpreted as the media being the watermark itself. That is, the media (image, video, audio) contains enough unique information to be able to detect copies. The main difficulty of passive detection methods is that the videos are not supposed to be identical. Brightness or contrast enhancement, compression, noise, bandwidth limitation, mixing with unrelated audio, overlay text or geometric transformations can be observed on videos which yield highly modified duplicate video signals. As a result copies can be considered less similar compared to the reference video which can be considered similar for Content-Based Video Retrieval (CBVR) applications, thus the name Content-Based Copy Detection (CBCD). Although in general video content is considered as an image sequence, in case of availability of audio the correspondence of audiovisual content constitutes an indispensable information source. In some copy/duplication cases audio content is preserved unaffected, in others there may be additional modifications to audio such as bandwidth limitation, coding related distortion, mixing with unrelated audio and in some other cases replaced by entirely different audio stream. This said; it is obvious from the perspective of a practical application that audio component of a video is an indispensible additional information source for the detection of duplicates. In this work audio and visual content of a video are utilized jointly for a robust solution of the CBCD