Pattern Recognition 36 (2003) 2377 – 2394 www.elsevier.com/locate/patcog On the concept of best achievable compression ratio for lossy image coding J.A. Garcia a ; ∗ , J. Fdez-Valdivia a , Xose R. Fdez-Vidal b , Rosa Rodriguez-S anchez a a Departamento de Ciencias de la Computaci on e I.A., E.T.S. de Ingenier a Inform atica, Universidad de Granada, 18071 Granada, Spain b Departamento de Fisica Aplicada, Facultad de Fisica, Universidad de Santiago de Compostela, 15706 Santiago de Compostela, Spain Received 8 August 2002; accepted 22 January 2003 Abstract The trade-o between image delity and coding rate is reached with several techniques, but all of them require an ability to measure distortion. The problem is that nding a general enough measure of perceptual quality has proven to be an elusive goal. Here, we propose a novel technique for deriving an optimal compression ratio for lossy coding based on the relationship between information theory and the problem of testing hypotheses. The best achievable compression ratio determines a boundary between achievable and non-achievable regions in the trade-o between source delity and coding rate. The resultant performance bound is operational in that it is directly achievable by a constructive procedure, as suggested in a theorem that states the relationship between the best achievable compression ratio and the Kullback–Leibler information gain. As an example of the proposed technique, we analyze the eects of lossy compression at the best achievable compression ratio on the identication of breast cancer microcalcications. ? 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Best achievable compression ratio; Kullback–Leibler information gain; Image delity; Lossy image coding; Digitized mammograms 1. Introduction Digital image transmission and storage are facing major challenges due to growing size of image datasets and practi- cal limitations in transmission bandwidth and storage space. This explains why image compression has become a popu- lar necessity: applying coding techniques reduces the image transfer time, and therefore reduces the cost; applying image compression also reduces the storage requirements, network trac, and therefore improves the eciency. During the past two decades, various lossless and lossy image coding techniques have been developed (for a list of references see Ref. [1]). Typical lossless coders can attain ∗ Corresponding author. Tel.: +34-958-240592; fax: +34-958- 243317. E-mail address: jags@decsai.ugr.es (J.A. Garcia). URL: http://decsai.ugr.es/∼jags/ compression ratios of only 2:1 or 3:1 for most images, thus users often prefer to deal with lossy algorithms which can achieve high compression rates, e.g., 50:1 or more. The prob- lem is that high compression ratios are possible at the cost of imperfect source representation. Compression is lossy in that the decoded images are not exact copies of the originals but, if the properties of the human visual system are correctly exploited, original and decoded images will be almost in- distinguishable. The trade-o between image distortion and coding rate may be stated as follows [2]: How much delity in the representation are willing to give up in order to re- duce the storage or the number of bits required to transmit the data? As an example, consider compression of digitized mam- mograms (most of the authors digitize them with a spatial resolution of 0.1 or 0:05 mm producing a huge amount of data). Radiologists look for certain signs and characteris- tics indicative of cancer when evaluating a mammogram. Among these signs is the presence of clustered microcalci- cations. Individual breast cancer microcalcications appear 0031-3203/03/$30.00 ? 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/S0031-3203(03)00047-5