> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1 A Comparative Study of Horizontal Object Clustering-based Fragmentation Techniques Adrian Sergiu Darabant, Alina Campan Abstract— Design of modern Distributed Object Oriented Databases (DOODs) requires class fragmentation techniques. Although research has been conducted in this area, most of the developed methods are inspired from the relational fragmentation algorithms. In this paper we develop a comparative approach of two new methods for horizontal class fragmentation in a DOOD. These methods rely on two AI clustering algorithms: the agglomerative hierarchical method and the k-means centroid based method. In order to be able to apply such algorithms, we model the class-partitioning problem in a vector space and use different object similarity measures. For comparison, we provide quality and performance evaluations using a partition evaluator function. Index Terms — Object oriented databases, Distributed database systems, Clustering methods, Database Fragmentation. I. INTRODUCTION Fragmentation in Distributed Object Oriented Databases can be performed in two basic ways: horizontally and vertically. In an Object Oriented (OO) environment, horizontal fragmentation distributes class instances into fragments. Each object in every fragment has the same structure and a different state or content. Thus, a horizontal fragment of a class contains a subset of the whole class extension. Recently, fragmentation issues have been considered in ([1], [4], [5], [6], [7], [2]), either for the complex object oriented data model, or just for flat data models. Algorithms for horizontal fragmentation of object classes are proposed in [10], [2], [11]. Existing fragmentation techniques for OODBs usually extend and develop the relational fragmentation and allocation techniques ([8]). But OO data models are inherently more complex than the relational model. Features like encapsulation, inheritance, class aggregation hierarchy and association relations complicate the definition of the horizontal class fragmentation. So, it might be more efficient to approach on different basis the fragmentation in DOODs. Manuscript received July 10, 2004 A.S. Darabant – PhD Student at Babes Bolyai University, Faculty of Mathematics and Computer Science, 1 Kogalniceanu, 3400 Cluj Napoca, (e- mail:dadi@cs.ubbcluj.ro). A. Campan – PhD Student at Babes Bolyai University, Faculty of Mathematics and Computer Science, 1 Kogalniceanu, 3400 Cluj Napoca, (e- mail:alina@cs.ubbcluj.ro). A. Contributions We focus in this paper on comparing the quality of horizontal object oriented fragmentations obtained by applying two alternative algorithms: the hierarchical fragmentation and k-means clustering fragmentation algorithms. They are based on clustering techniques and are presented in detail in [15], [16]. Although hierarchical and k- means centroid-based algorithms [3] are well known techniques in the clustering theory, they have not been used before in object-database fragmentation, to our knowledge. The comparative study is performed for object models with simple attributes and methods [2]. Essentially, the algorithms group objects together by their similarity with respect to a set of user queries with conditions imposed on data. Similarity (dissimilarity) between objects is defined in a vector space model and is computed using different metrics. As a result, we cluster objects that are highly used together by queries. This paper is organized as follows. The next section of this work shortly presents the object data model and the constructs used in defining the object database and expressing queries. It also introduces the vector space model we use to compare objects, methods for constructing the object characteristic vectors and similarity metrics over this vector space. Section 3 presents the fragmentation algorithms. In section 4 we evaluate the quality of our fragmentation schemes by using an evaluator function. II. DATA MODEL We use an object-oriented model with the basic features described in the literature [9][13]. Object-oriented databases represent data entities as objects supporting features like inheritance, encapsulation, polymorphism, etc. Objects with common attributes and methods are grouped into classes. A class is an ordered tuple C=(K,A,M,I), where A is the set of object attributes, M is the set of methods, K is the class identifier and I is the set of instances of class C. Every object in the database is uniquely identified by an OID. Classes are organized in an inheritance hierarchy, in which a subclass is a specialization of its superclass. An OODB is a set of classes from an inheritance hierarchy, with all their instances. There is a special class Root that is the ancestor of all classes in the database. We assume only simple inheritance – thus, in our model, the inheritance graph is a tree. An entry point into a database is a metaclass instance bound