A Flexible Compressed Text Retrieval System Using a Modified LZW Algorithm Nan Zhang 1 Tao Tao Ravi VijayaSatya Amar Mukherjee Tim Bell 2 Donald Adjeroh 3 ABSTRACT With an increasing amount of text data being stored in a compressed format, efficient information retrieval in the compressed domain has become a major challenge. Being able to randomly access and partially decode 4 the compressed data is highly desirable for efficient retrieval and is required in many applications. For example, in a digital library information retrieval system, only the records that are relevant to the query should be retrieved. Flexibility, in terms of access to different levels of detail in the context is also desirable for various requirements. The efficiency of these operations depends on the compression method used. In this paper, we present modified LZW algorithms that support efficient indexing and searching on compressed files. Rather than fully decompressing the text and outputting the results selectively, the proposed approach allows random access and partial decoding of the compressed text and retrieves only the relevant parts. In addition to flexibility for dynamic indexing at different levels of granularity in the text, the scheme also provides the possibility of parallel processing for the compressed text. The compression ratio is also improved using the proposed modified LZW algorithm. Test results show that our public trie method has a compression ratio of 0.34 for TREC corpus and 0.32 with text 1 School of Computer Science, University of Central Florida, {nzhang,ttao,rvijaya,amar}@cs.ucf.edu 2 Department of Computer Science, University of Canterbury, New Zealand, {Tim.Bell@canterbury.ac.nz} 3 Lane Department of Computer Science and Electrical Engineering, West Virginia University, {adjeroh}@csee.wvu.edu 4 By partial decoding, we refer to the process of decoding selective portions of the compressed file without having to decode the whole document. 1