On Reversibility of Random Binning Based Data-hiding Techniques: Security Perspectives Sviatoslav Voloshynovskiy CVML,CUI - University of Geneva 24, rue General Dufour 1211, Geneva, Switzerland svolos@cui.unige.ch Oleksiy Koval CVML,CUI - University of Geneva 24, rue General Dufour 1211, Geneva, Switzerland koval@cui.unige.ch Emre Topak CVML,CUI - University of Geneva 24, rue General Dufour 1211, Geneva, Switzerland emre.topak@cui.unige.ch Jos ´ e Emilio Vila-Forc ´ en CVML,CUI - University of Geneva 24, rue General Dufour 1211, Geneva, Switzerland jose.vila@cui.unige.ch Thierry Pun CVML,CUI - University of Geneva 24, rue General Dufour 1211, Geneva, Switzerland pun@cui.unige.ch ABSTRACT Reversibility of data-hiding refers to the reconstruction of original host data at the decoder from the stego data. Previ- ous works on the subject are concentrated on the reversibil- ity of data-hiding techniques from multimedia perspectives. However, from the security point of view, that at our knowl- edge was not exploited in existing studies, reversibility could be used by an attacker to remove the complete trace of wa- termark data from the stego data in the sense of designing the worst case attack. Thus, the aim of this paper is to analyze the reversibility of data-hiding techniques based on random binning from the security perspectives. Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneous. General Terms performance, theory. Keywords reversibility, data-hiding, random binning, exhaustive search, security leakage analysis. 1. INTRODUCTION (For further information: contact S. Voloshynovskiy, e- mail: svolos@cui.unige.ch, http://sip.unige.ch) Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM&Sec’06, September 26–27, 2006, Geneva, Switzerland. Copyright 2006 ACM 1-59593-493-6/06/0009 ...$5.00. Data-hiding refers to the ways of reliable communication of information embedded into a host data for the purposes of copyright protection, content authentication, broadcast monitoring, database indexing, etc. Initially proposed data-hiding methods consider the host data as a source of interference to the data-hiding commu- nication. Within this class of methods, the information to be embedded, i.e., watermark data, is generated without considering the host data according to random coding-based techniques [4]. Later, random binning-based techniques were introduced that are inspired by Gel’fand-Pinsker approach [5] for communications in channels where the channel state is non-causally available at the encoder. These methods take the host data into account in the generation of water- mark that lead to a substantial increase in the achievable embedding rate in comparison to the random coding based techniques. Depending on the particular application area, there are specific requirements to the data-hiding method to be ap- plied. In medical and military applications, content authen- ticity and availability of original host data at the decoder should be provided simultaneously. Moreover, in some cases, the original host data might need to be reproduced from the stego data over the time due to its unavailability or access restrictions. These needs bring discussions about the recov- ery of the original host data from the stego data, i.e., the reversibility of watermark embedding. The reversibility of random binning-based data-hiding tech- niques from multimedia perspectives is considered in [12] for a multiuser communications scenario. It is provided for the scenarios of authorized, i.e. with the knowledge of key, and unauthorized, i.e. without the knowledge of key, users. It is shown that in the noisy communications assumption the unauthorized user is able to estimate the host data using optimal Minimum Mean Square Error (MMSE) estimator with the same estimation accuracy as for the authorized one under optimal selection of a compensation parameter. However, in the noiseless communications assumption, the authorized user can completely recover the host data that is