Category: High Performance Computing Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 3590 Techniques for Specialized Data Compression INTRODUCTION In today’s world, the continuing improvement in data storage technologies makes enormous storage capacities available to users. For instance, during the 1996-2010 period, the average storage capacity of desktop personal computer drive increased by 375 times (Adams, 2012). Even so, the growth of available storage capacity is still outpaced by the growth of the information produced worldwide, especially that a gigabyte of stored content can generate a much higher volume of transient data that is not typically stored, but is often transmitted (Gantz & Reinsel, 2011). Hence the need to conserve not only data storage space but also data transmission bandwidth. This need is answered by data compression. Data compression is “the process of converting an input data stream into another data stream that has a smaller size” (Solomon, 2007). Data compression is possible thanks to redundancy of data; it makes use of the fact that some portions of the input stream need not to be stored, as they may be recreated given the remaining parts of the stream, and/or the fact that some portions of the data are either not relevant to the user at all, or their relevance is negligible. Obviously, pursuing the latter option usually leads to achieving much higher compression ratios, but results in a loss of information qualified as irrelevant during compression, which may be considered as relevant in the future, for another use, or by another user. As there is extensive literature devoted to lossy data compression (see e.g., Sayood, 2012, and references therein), this article describes only lossless methods. The lossless data compression methods can be classified into two types. The general-purpose methods use a general model that adapts to its input, and thus manage to compress various types of data. The special- ized methods are designed to process only one type of data (defined more or less narrowly). Thus, they can not only start compression with a model prepared for data of that specific type, but also exploit redundancy specific only to that type, which would be invisible to a general-purpose method. Contemporary data compression methods typi- cally combine a set of techniques to achieve superior compression ratios. The aim of this article is to review such component techniques, useful for specialized compression of various types of data. First, however, the Background section gives some information on how most popular general-purpose data compression methods work. BACKGROUND The base technique of data compression, that is in- cluded in most contemporary compression methods, is statistical coding. It uses statistics of occurrence of respective symbols in the input stream to minimize the size of the output stream. The most widely-used technique of this kind is Huffman coding, which as- signs short codewords to frequent input symbols, and long codewords to rare symbols (Huffman, 1952). The Huffman coding is optimal in the sense that no other codeword assignment could produce shorter output stream. Further improvement is still possible by assign- ing value ranges, instead of individual codewords, to input symbols. Such approach is taken by the arithmetic coding, where the entire input stream is encoded with a binary fraction representing its cumulative probability (Rissanen, 1976). Most real-world data types exhibit some form of correlation between symbols within them, which cannot be exploited by mere counting occurrences of individual symbols in the input stream. There are Jakub Swacha University of Szczecin, Poland DOI: 10.4018/978-1-4666-5888-2.ch351