International Journal of Computer Applications (0975 8887) Volume 99No.4, August 2014 18 Slop based Partitioning for Vertical Fragmentation in Distributed Database System Ashish Ranjan Mishra Department of Computer Science and Engineering Kamla Nehru Institute of Technology Sultanpur-228118, Uttar Pradesh, India Neelendra Badal Department of Computer Science and Engineering Kamla Nehru Institute of Technology Sultanpur-228118, Uttar Pradesh, India ABSTRACT A Vertical Partitioning is the process of dividing the attributes of a relation. Further, a good Vertical Partitioning puts frequently accessed attributes of the relation together in a fragment. Various researchers have proposed different algorithms for Vertical Partitioning. Still, there is a scope of improvement in previous algorithms for Vertical Partitioning. In this paper a new algorithm is proposed for Vertical Partitioning in Distributed Database System. The proposed algorithm is named as Slop Based Partitioning Algorithm (SBPA). This algorithm utilizes the Clustered Affinity Matrix (CAM), which is calculated from Attribute Usage Matrix (AUM) and Frequency Matrix (FM). Keywords Vertical Partitioning, Clustered Affinity Matrix, Attribute Usage Matrix, Frequency Matrix, Distributed Database System, Slop Based Partitioning Algorithm. 1. INTRODUCTION In a Distributed Database System, the fragments of the relation are scattered over the collection of independent sites. In the Distributed Database System it may be possible that queries may not retrieve the result from the local site. It is required to communicate to the other sites to retrieve the result. Frequent communication to the other sites may result in bad Query-Response-Time (QRT). Vertical Partitioning of the relation into fragments plays a crucial role in improving the QRT. A good method for Vertical Partitioning can enhance the QRT by dividing a complex large relation into the small fragments. The most frequently accessed fragment is stored in the main memory. It causes the reduced page access from the secondary memory. In Distributed Database System a query can also divided into sub-queries that operates on different fragments. The execution of the sub-queries is performed concurrently on different fragments. There are two partitioning approaches for a relation. First approach is Horizontal Partitioning and second is Vertical Partitioning. Horizontal Partitioning partitions the relation in the smaller relations on the basis of rows. Each smaller relation contains the same number of columns, but fewer rows. Vertical Partitioning is process of dividing the table on the basis of different columns. Vertical Partitioning divides a relation into multiple relations that contain fewer columns. A query does not require the entire attributes of a relation at the same time. Only few attributes of the relation is needed by queries. So the Vertical Partitioning is more effective in improving the QRT rather than Horizontal Partitioning. In this paper a new Vertical Partitioning algorithm SBPA is proposed for vertical partitioning. The input parameter for this SBPA is Clustered Affinity Matrix which is calculated from Attribute Usage Matrix (AUM) and Frequency Matrix (FM). After calculating Clustered Affinity Matrix (CAM), the fragments of the relation are created from SBPA using CAM. SBPA fragments the attributes of relation using CAM where the slop diminishes very rapidly. The rest of this paper is organized as follows. Previous work on Vertical Partitioning has been critically reviewed in section 2. In section 3 technique used in SBPA for Vertical Partitioning is described. Section 4 and section 5 describe an experimental set and experimental result respectively on the proposed Vertical Partitioning algorithm. The conclusion and future scope is described in section 6. 2. LITRETURE REVIEW From the early of the 1970s, minimization of the disk I/O is an important topic. From that time, algorithms have been developed to reduce the I/O by making the cluster of the complex relation. This results in reduced the page access from the secondary memory. In 1972, the first algorithm for clustering was developed by McCormick et.al. in [4] with the name of Bond Energy Algorithm (BEA). The purpose of this algorithm is to identify the cluster in the complex relation. The limitation of this algorithm is that it is hard to implement without human’s interpretation. Sometimes blocks may have overlaps and some elements do not belong to any block. So the clustering is not efficient as the user except. In 1984, after the BEA, a new algorithm was proposed by Navathe et.al. in [5].This clustering algorithm considered the frequency of queries first time and reflects the frequency in the attribute affinity matrix on which clustering was performed. The complexity of this algorithm is O(n 2 ) time where n is the number of times the partitioning is repeated. The complexity can be increased if overlapping is allowing. The Optimal Binary Vertical Partitioning algorithm [7] was proposed by Wesley W. Chu et.al. . It uses the branch and bound technique [3] to make a binary tree whose nodes represent the query. This algorithm reduces time complexity compared to the Navathe et.al. in [6] but it does not consider the impact of query frequency, and also its run time still grows exponentially with the number of queries. The Graph Traversal Vertical Partitioning in [6] was proposed in 1989 by Navathe et.al. . This algorithm traverses the graph and divides the graph into several sub graphs, each of which represents a cluster. In this algorithm, the frequent queries and infrequent queries are given the same priority, this may lead to an inefficient partitioning results. The reason for this is that the attribute that are usually accessed together in infrequent