Decision-theoretic consideration of robust hashing: link to practical algorithms Oleksiy Koval, Sviatoslav Voloshynovskiy, Fokko Beekhof, and Thierry Pun CUI-University of Geneva, Stochastic Image Processing Group, 24, rue du G´ en´ eral-Dufour, 1211 Gen` eve 4, Switzerland {Oleksiy.Koval, svolos, Fokko.Beekhof, Thierry.Pun}@cui.unige.ch http://sip.unige.ch Abstract. In this paper we propose to consider the problem of robust perceptual hashing of multimedia data as composite hypothesis testing. Such a problem formulation is justified by prior ambiguity about source statistics and channel parameters that is usually the case in multiple practical scenarios. An asymptotically universal test approaching the performance of the classical maximum likelihood test performed under the exact knowledge of the mentioned statistics is proposed under the specific constraints on the assumed source and geometric channel models. Finally, we consider the problem of a practical hash construction under the constraints on complexity, robustness to geometrical transformations, universality and security. The proposed solution is based on a binary hypothesis testing for randomly or semantically selected blocks or regions in sequences or images. 1 Introduction Recent advances of modern digital imaging and audio open new directions in modern imaging science, content management and secure communications. Ev- idently, this development is still underway and further success in the men- tioned and possibly new directions can be easily foreseeing. Simultaneously, this avalanche progress is inalienably followed by a risk of various malicious illegal actions including violation of copyright, unauthorized prohibited usage, multi- plication and distribution of digital media, high fidelity efficient counterfeiting of digital and analog content as well as goods and products justifying an urgent need for reliable document, product and person identification. The requirements one needs to satisfy developing such techniques include robustness and security in order to withstand various attacks, and at the same time preserve privacy as well as universality to provide asymptotic independence to a complete or partial lack of prior information defining the protocol design particularities. Historically, a suggested solution to the above mentioned set of problems was based on a classical cryptographic hashes. Possessing excellent security level, authentication mechanism based on cryptographic hash functions appeared to be still not free from some shortcomings mostly concerning its robustness to various representations of multimedia files. For example, an image can be represented in