PhD Thesis Proposal Compressed Data Structures based on Adaptive Algorithms Carlos E. Bedregal cbedrega@dcc.uchile.cl Advisors: Gonzalo Navarro, gnavarro@dcc.uchile.cl er´ emy Barbay, jbarbay@dcc.uchile.cl Universidad de Chile, Chile April, 2011 Abstract. Efficient access to large data collections is nowadays an in- teresting problem for many research areas and applications. A recent conception of the time-space relationship suggests a strong relation be- tween data compression and algorithms in the comparison model. In this sense, efficient algorithms could be used to induce compressed represen- tations of the data they process. Examples of this relationship include unbounded search algorithms and integer encodings, adaptive sorting algorithms and compressed representation of permutations, or union al- gorithms and encoding for bit vectors. In this thesis, we propose to study the time-space relationship on different data types. We aim to define new compression schemes and compressed data structures based on adaptive algorithms that work over these data types, and to evaluate their prac- ticality in data compression applications. 1 Introduction Nowadays computer applications are likely to process and manage large amounts of data: text, images, videos, biological sequences, signals, and so on. Although massive storage is easily available, the real problem is efficient access to the data. For example, CPU caches are many times faster than main memory, while this in turn is many times faster than hard drives. Given this hierarchy of memories, it is preferable to work with compressed representations of the data that may fit higher levels of memory and allow operations over the data without decom- pressing it. This approach has been successfully applied in large text collections through compressed full-text indexes [58]. The relation between algorithms and encodings was first suggested by Bent- ley and Yao [8], who showed how adaptive search algorithms in sorted arrays defined a family of adaptive prefix encodings for integers. Bentley and Yao rep- resentation for integers is given by the binary result of the sequence of compar- isons performed by the algorithm to find a number in an unbounded context; the encodings obtained were isomorphic to the codes proposed by Elias [20]. 1