ON SECURITY OF GEOMETRICALLY-ROBUST DATA-HIDING Emre Topak, Sviatoslav Voloshynovskiy, Oleksiy Koval, Jos´ e Emilio Vila Forc´ en and Thierry Pun CUI, University of Geneva 24, rue du General-Dufour, CH-1211 Geneve 4, Switzerland ABSTRACT In this paper we analyze security of robust data-hiding in channels with geometrical transformations. We categorize possible decod- ing strategies for channels with geometrical transformations within the information-theoretic framework for geometrically-robust data- hiding. Furthermore, considering template-based and redundant- based design of geometrically robust data-hiding systems, we pro- vide the analysis of general attacking strategies and particular at- tacking scenarios for each group of structured codebooks. Finally, reversibility of data-hiding and effect of security leakages on sys- tem performance are investigated. 1. INTRODUCTION Digital data-hiding aims at communicating application-specific data reliably through a specified channel by embedding it into some digital multimedia documents. This data should be reliably ex- tractable even some intentional and unintentional attacks were ap- plied to the marked document. In general case, digital data-hiding can be considered as a game between data-hider and attacker. O’Sullivan, Moulin and Ettinger were among the first who recognized this game [1]. In the ex- tended version of the previous paper [2], Moulin and O’Sullivan have considered two possible set-ups. In the first one they assumed the availability of host at both encoder and decoder, i.e., the so- called private game and in the second one they considered a case, where the host is available only at the encoder, i.e., a public game. Moulin and O’Sullivan considered the games with the capacity as a cost function. Moreover, they assumed that the decoder is informed of the attack channel, and thus, applied maximum likelihood (ML) decoding. The knowledge of attack channel at the decoder is not a very common case for most practical applications. More realistic set-up was considered by Somekh-Baruch and Merhav [3],[4] in assump- tion that the attacker strategy is not known to either encoder nor decoder. Moreover, they supposed that any conditional pdf that satisfies certain energy constraint might be a valid attacker choice. In the first paper [3] Somekh-Baruch and Merhav have considered private game, where both capacity and error exponent were ana- lyzed as the cost functions. The channel capacity is a good mea- sure of performance, if one is interested to know the maximum rate of reliable communications. The error exponent provides the lowest achievable probability of error at a given information rate. From the practical point of view, the error exponents seem to be more attractive since they bring out clear and simple relationship between error probability, data rate, constraint length, and chan- nel behaviour [5]. A remarkable result has been achieved since the attack channel was not known at the decoder [3] using maxi- mum mutual information (MMI) decoding. This decoding strategy can be considered as universal decoding for this class of channels. Such a decoder can be regarded as a two part system that consists of channel state estimation (CSE) and decoder for the particular CSE output. These two procedures are iterated to guarantee the re- liable communications at rates below the channel capacity defined by the max-min game. In [4], Somekh-Baruch and Merhav have considered capacity of a public game using the same MMI decoding set-up. Being theo- retically justified, this approach meets some difficulties in practical applications dealing with geometrical channels. In such kind of channels, the attacker applies some desynchronization transform to the watermarked data from a set of parametric transforms with large cardinality. On the data-hider side, the applied transform can be regarded as a random one with the uniform probability of ap- pearance over the set of chosen cardinality. To simplify the task of the decoder, most of data-hiding sys- tems use certain simplifications that lead to the suboptimal per- formance of universal decoder. First, the CSE-decoding is imple- mented in the sequential two-step manner rather than in iterative way. Once one obtains the CSE, the channel state compensation (CSC) is applied and the message decoding is based directly on the recovered data. Second, to simplify the task of CSE, most of data-hiding techniques are exploiting specially structured code- books instead of random coding. This is closely related to the use of special pilot or template signals that facilitate estimation prob- lem often used in digital communications. We will refer to these codebooks as geometrically structured codebooks. Depending on the particular codebook design, they might be classified into two main groups: template-based structured codebooks in which a specially de- signed template or a pilot data is used to perform CSE and CSC [6]; redundant-based structured codebooks in which codewords have special construction or statistics to aid CSE and CSC [7]. A thorough theoretical analysis of this geometrical synchro- nization framework is given in [8]. This analysis can be also quite indicative while considering security leakages of robust data- hiding schemes based on the structured codebooks. The rest of the paper is organized as follows. In Section 2, problem formulation is presented. In Section 3, possible de- coding strategies are considered. Afterwards, in Section 4, the information-theoretic framework to data-hiding synchronization is provided. Section 5 contains the analysis of attacking strategies and particular attacking scenarios for each group of structured codebooks. In Section 6, reversibility of data-hiding and the ef- fect of security leakages on the cardinality of the decoding space are investigated. Finally, Section 7 concludes the paper. Notations: We use capital letters to denote scalar random vari- ables X , bold capital letters to denote vector random variables X, corresponding small letters x and x to designate the realization of scalar and vector random variables, respectively. The superscript N is used to denote length-N vectors x = x N = {x[1], x[2],..., x[N]} with i th element x[i]. We use X p X (x) or simply X p(x) to in- dicate that a random variable X is distributed according to p X (x). Calligraphic fonts X designate sets X X and |X | denotes the cardinality of the set X . Z and R stand for the set of integers and the set of real numbers, respectively. H(X ) denotes the entropy of a random variable X and I (X ; Y ) designates the mutual information