Using Compressed B
+
-trees for Line-based Database Indexes
Hung-Yi Lin
1
and Chin-Ling Chen
2
1. Department of Finance, Chaoyang University of Technology, No. 168, Jifong E. Rd., Wufong Township, Taichung
County 41349, Taiwan (R.O.C.), linhy@mail.cyut.edu.tw
2. Department of Computer Science and Information Engineering, Chaoyang University of Technology, No. 168, Jifong E.
Rd., Wufong Township, Taichung County 41349, Taiwan (R.O.C.), clc@mail.cyut.edu.tw
Abstract-In this paper, we propose a new indexing method called
compressed B
+
-tree. The traditional B
+
-tree is the most common
dynamic index structures in database systems. However, in
practical applications, its performance still remains considerable
room for improvement. Compressed B
+
-tree outperforms
traditional access methods in two respects. The first is more
economic storage requirement in the indexing structure. The
second is better performance in retrieval. In addition, a
compressed B
+
-tree can be used for the implementation of line-
based database indexes.
Keywords-B
+
-trees, compressed B
+
-trees, index structure, multi-
dimensional databases, splitting policy.
1. INTRODUCTION
Balanced trees are an essential data structure for
implementation of database indexes because they offer
rapid access to any record in the database. With an
increasing number of computer applications relied heavily
on multi-dimensional data [4, 5, 13, 15], the database
community has devoted considerable attention to multi-
dimensional database management. Storage and retrieval
of multi-dimensional data has been frequently discussed in
the database community.
System performance of a database index depends on
hierarchical structure. The hierarchical construction of an
index structure relies on three factors. First is the
distribution of indexed data in the space. In most practical
applications, the data are typically skewed and non-
uniform. The second factor is the data insertion order. As
generally known, data with different insertion orders will
result in different hierarchical directories. The third is data
insertion algorithm. A poor algorithm may not be capable
of coordinating indexed data into a well-organized
structure. Data insertion algorithms in many literatures [3,
11, 12, 14] suffer from their splitting policies. In fact, an
improper splitting policy interferes in the hierarchical
directory of an index structure. We propose a better
insertion mechanism to reduce and even eliminate these
fatal impacts on the indexing structure.
2. BACKGROUND
Considerable work has been devoted to the appropriate
organization of trees for indexes. B
+
-trees [1, 2, 3, 9] are
the most common dynamic index structures in database
systems. However, not so much has been reported on the
appropriate organization of the keys within each tree node,
even though this organization can have a considerable
impact on the total cost. The important design issue in this
paper is to reconsider entry arrangement between a full
target leaf and its siblings before one split is involved.
Since data amount in a multi-dimensional database
tends to be large, system performance is usually crucial.
Two principles, space and time efficiencies, are generally
taken in evaluating system performance. Space efficiency
includes two parts: storage requirement and storage
utilization. System storage requirement (denoted by SSR)
measures the total amount of storage space for preserving
the whole index structure. That is,
SSR=(total number of allocated nodes)×(maximum node
size)
Suppose one entry occupies k bytes, then the practical
memory demand for an index structure is k×SSR bytes.
System storage utilization (denoted by SSU) measures the
occupational situation in each tree node. Without loss of
generality, we define SSU as following.
% 100
entries indexed of number total
× =
SSR
SSU
The more compact the tree construction is, the higher the
SSU will be. Nevertheless, compactness doesn’t guarantee
economy. An index structure with space efficiency should
maximize its SSU and minimize its SSR.
Time efficiency includes two major parts: system
maintaining performance (insertion and deletion
operations) and data retrieval performance (query and
search operations). No matter what operation is applied,
time cost is directly proportional to the depth of index. A
deeper index implies more nodes and then more disk
blocks are involved for data retrieval. As a result, a deeper
index has the poorer time efficiency.
3. COMPRESSED B
+
-TREES
2006 IEEE International
Symposium on Signal Processing
and Information Technology
0-7803-9754-1/06/$20.00©2006 IEEE 258