Guessing Revisited: A Large Deviations Approach Manjesh Kumar and Rajesh Sundaresan Department of Electrical Communication Engineering Indian Institute of Science, Bangalore 560012, India Email: {manjesh, rajeshs}@ece.iisc.ernet.in Abstract— The problem of guessing a random string is revisited and some prior results on guessing exponents are re-derived using the theory of large deviations. It is shown that if the sequence of distributions of the information spectrum satisﬁes the large deviation property with a certain rate function, then the limiting guessing exponent exists and is a scalar multiple of the Legendre-Fenchel dual of the rate function. Example applications re-deriving prior results are also given. I. I NTRODUCTION Let X n =(X 1 , ··· ,X n ) denote n letters of a process where each letter is drawn from a ﬁnite set X with joint probability mass function (pmf) (P n (x n ): x n ∈ X n ). Let x n be a realisation and suppose that we wish to guess this realisation by asking questions of the form “Is X n = x n ?”, stepping through the elements of X n until the answer is “Yes”. We wish to do this using the minimum number of expected guesses. There are several applications that motivate this problem. Consider cipher systems employed in digital television or DVDs to block unauthorised access to special features. The ciphers used are amenable to such exhaustive guessing attacks and it is of interest to quantify the effort needed by an attacker. Massey [1] observed that the expected number of guesses is minimised by guessing in the decreasing order of P n - probabilities. Deﬁne the guessing function G * n : X n → {1, 2, ··· , |X| n } to be one such optimal guessing order 1 . G * n (x n ) = g implies that x n is the gth guess. Massey’s question was to characterise E [G * n (X n )]. Arikan [2] con- sidered the more general problem of identifying the growth of E [G * n (X n ) ρ ] as a function of n for an independent and identically distributed (iid) source with marginal pmf P 1 and ρ> 0. He showed that the growth is exponential in n; limiting exponent E(ρ) := lim n→∞ 1 n log E[G * n (X n ) ρ ] (1) exists and equals ρH α (P 1 ) with α =1/(1+ρ), where H α (P n ) is the R´ enyi entropy of order α for the pmf P n , given by 1 1 − α log   x n ∈X n P n (x n ) α  ,α =1. (2) Malone and Sullivan [3] showed that the limiting exponent E(ρ) of an irreducible Markov chain exists and equals the 1 If there are several sequences with the same probability of occurrence, they may be guessed in any order without affecting the expected number of guesses. logarithm of the Perron-Frobenius eigenvalue of a matrix formed by raising each element of the transition probability matrix to the power α. From their proof, one obtains the more general result that the limiting exponent exists for any source if the R´ enyi entropy rate of order α, lim n→∞ n -1 H α (P n ), (3) exists for α =1/(1 + ρ). Pﬁster and Sullivan [4] showed the existence of (1) for a class of stationary probability measures where the probability of ﬁnite-length strings are approximately determined by letter combinations. For such a class, they showed that the guessing exponent has a variational characterisation (see (4) later). For uniﬁlar sources Sundaresan [5] obtained a simpliﬁcation of this variational characterisation using a direct approach and the method of types. In this paper, we give a different proof of Malone & Sullivan’s implicit result in [3] that the limiting exponent exists if and only if the limiting R´ enyi entropy rate exists. Our proof exploits a connection between guessing and compression highlighted by Sundaresan [5]. A simple argument then leads to the following useful result: if the sequence of distributions of the information spectrum (1/n) log(1/P n (X n )) (see Han [6]) satisﬁes the large deviation property, then the limiting exponent exists. This is useful because several existing large deviations results can be readily applied. Our approach gener- alises all prior results on guessing (without side information and key-rate constraints). II. MAIN RESULTS We begin with some words on notation. Let M(X n ) denote the set of pmfs on X n . The Shannon entropy for a P n ∈ M(X n ) is H(P n )= −  x n ∈X n P n (x n ) log P n (x n ) and the R´ enyi entropy of order α =1 is (2). The Kullback- Leibler divergence or relative entropy between two pmfs Q n and P n is D(Q n ‖ P n )=     x n ∈X n Q n (x n ) log Q n (x n ) P n (x n ) , if Q n ≪ P n , ∞, otherwise, where Q n ≪ P n means Q n is absolutely continuous with respect to P n . By a source, we mean a sequence of pmfs (P n : n ∈ N) where P n ∈M(X n ) and N is the set of natural numbers. Recall the deﬁnitions of limiting guessing exponent NCC 2009, January 16-18, IIT Guwahati