ICITNS 2003 International Conference on Information Technology and Natural Sciences AN INTEGRATED STRATEGY FOR DATA FRAGMENTATION AND ALLOCATION IN A DISTRIBUTED DATABASE DESIGN ISMAIL OMAR HABABEH School of Computing Leeds Metropolitan University E-mail ismail.hababeh@uaeu.ac.ae NICHOLAS BOWRING School of Computing Leeds Metropolitan University E-mail N.Bowring@lmu.ac.uk MUTHU RAMACHANDRAN School of Computing Leeds Metropolitan University E-mail M.Ramachandran@lmu.ac.uk ABSTRACT A distributed database is structured from global relations, fragmentation and data allocation. A global relation can be divided into fragments and each fragment may itself contain a relation. The fragmentation describes how each fragment of the distributed database is derived from the global relations. The data allocation allows the allocation of discrete sets of fragments to the sites of the computer network supporting the distributed database. The objective of the present work is to develop a strategy for distributed database design that is simple and useful to achieve the objectives of data fragmentation, allocation, and replication. It has been designed to fragment and allocate data in a distributed relational database system using different types of computers on a network. KEY WORDS Data partitioning, segments, clustering, fragments, benefit value, data allocation. 1. INTRODUCTION A distributed system is a collection of independent computers that appear to the users of the system as a single computer [1]. The trend in the computer field is toward decentralization. The driving force governing the movement away from centralized toward distributed systems is that they have better performance than a single large centralized system [2]. Academic, industrial and governmental organizations have been using distributed databases to support their needs. This use was accelerated by the advance in telecommunication systems and satisfied the geographical dispersed information. This paper presents an approach for fragmentation and allocation of data in a distributed relational database and shows a way of grouping sites into clusters to which fragments would be allocated. In this approach, the database relations will be partitioned into pair-wise disjoint fragments, which will be allocated to clusters and their respective sites according to an allocating algorithm. This approach describes a method to minimize the transactions communication cost by distributing the database relations over the sites, and increasing data availability and integrity by allocating multiple copies of the same database fragments over the sites. 2. BACKGROUND Existing distributed database methodologies are limited in their theoretical and implementation parts. They don't deal with distributed database issues separately, don't optimize transaction response time, don't test their performance on different types of network connectivity, and present exponential time of complexity. Various strategies have already been described that effectively partition data across distributed systems. Naturally, there are benefits and drawbacks to all schemes. Minyoung and Yang-sun [3] have proposed a methodology for partitioning and allocating data effectively over a network for PC-based distributed database design. The researchers present a cost model and propose a heuristic procedure for merging mixed fragments (grid cells), based on the joint cost and the frequencies of the transactions accessing the cells. The purpose of merging cells is to minimize the global transaction processing cost. Because the sequence of attributes has no meaning in a relation, the possible combinations of horizontal merging can be minimized to a time computation of complexity bounded by n(n- 1)/2 instead of 2 n -1 . Navathe, Karlapalem, and Minyoung [4] have presented a methodology for generating a mixed fragmentation scheme (horizontal and vertical) for the initial distributed database design phase. They form a grid on a relation, which suggests all possible ways that