A general compression algorithm that supports fast searching Kimmo Fredriksson a,1 Szymon Grabowski b,* a Department of Computer Science, University of Joensuu, P.O. Box 111, 80101 Joensuu, Finland b Technical University of  od´ z, Computer Engineering Department, Al. Politechniki 11, 90–924  od´ z, Poland Key words: algorithms, compression, searching in compressed text, q–grams 1 Introduction The task of compressed pattern matching [2] is to report all the occurences of a given pattern P in a text T available in compressed form. Certain compression algorithms allow for searching without prior decoding which may be practical, especially if the search is faster than in the non-compressed representation. Most of the known schemes, however, either assume a text formed into words, or are complex and rather theoretical. The former option [11] is very practical, as long as it can be applied: the mechanism is simple, the search is fast, the compressed text together with its word dictionary takes only about 30% of the original representation, and more advanced queries can also be handled with relatively little difficulty. The problem is, however, that the assumption of “text” made up of “words” separated with spaces, so natural and convenient for Western languages, is inappropriate for oriental languages (e.g., Chinese, Korean), DNA and protein sequences, or structured music files (MIDI). Consequently, there are important applications for compression algorithms that allow searching directly in the compressed stream, without assuming practically anything about the data. The algorithm we present in this work belongs to this category. * Corresponding author. Email address: sgrabow@kis.p.lodz.pl (Szymon Grabowski). 1 Supported by the Academy of Finland, grant 202281. Preprint submitted to Elsevier Science 18 October 2006