International Journal of Computer Applications (0975 – 8887) Volume 97– No.2, July 2014 38 Executing Joins Dynamically in Distributed Database System Query Optimizer Sofia Gupta Research scholar Department of Computer Science and Engineering Guru Nanak Dev University, Amritsar, Punjab Rajinder Singh Asst. proffessor Department of Computer Science and Engineering GuruNanak Dev University, Amritsar, Punjab Tirath Singh Asst. proffessor Department of Computer Science and Engineering GuruNanak Dev University, Amritsar, Punjab ABSTRACT In order to join two sub queries involving data from multiple sites data has to be transmitted from one site to other. While transmitting the data within a network, the factors involved in distributed databases is communication cost and amount of data transmitted. To minimize these factors, join operation is used. There are two cases considered in which query processing using join and query processing using semi join are described. The amount of data transfer in case of join is more than in case of semi join. Hence sub operations are executed dynamically to improve the communication cost . General Terms Distributed database, query processing, query optimization, data transmitted. Keywords Distributed database, query optimization, data transfer, join, semijoin etc. 1. INTRODUCTION In recent years, there is high rate of research, development and implementation of distributed database management systems. . Distributed database connects several databases that are spread physically across computers in multiple locations by data communication network but it is centralized logically. The benefits provided by distributed processing include increase reliability, availability and localization and reduce communication costs. The two dominant approaches used for storing and managing database are centralized database management system and distributed database management system in which data is placed at central location and distributed over several locations respectively. Independent of the database approach used, one of the important issue in the database is the retrieval of data by using multiple table from central repository in centralized database and from number of sites in distributed database[1]. As data environments grow larger, it becomes difficult to store data at a single site [2]. One of the important problems in distributed systems is the efficient processing of queries in relational DDBMS where data transmission among different sites is involved in processing of query. The parameters like high query response time, sites to access these queries affect the performance of distributed queries. Also, database system performance is effective depends on join operator. Join is the primary target of query optimizer because of the high evaluation costs. The tables reside on different nodes of computer network data must be moved between nodes to join them [3]. The cost of distributed query involves processing cost and transmission cost. The transmission of data increases the communication cost. So the optimizer must consider efficient order in which tables are joined in such a way that communication overhead has cut down. There is a problem of finding an efficient join order for a query because query Optimizer has to examine number of existing substitutions queries. One tries to optimize the ordering of join directly whereas another replaces join by combinations of semi joins in order to minimize communication cost [4]. 2. QUERY PROCESSING AND QUERY OPTIMIZATION The performance efficiency of DDBMS is critically concerned to Query processing strategies. The retrieval of queries from different sites in DDB is called distributed query processing because the data is geographically distributed into multiple sites so the processing of a distributed query is composed of the following three phases:  Local processing phase  Reduction phase  Final processing phase The local processing phase needs local processing such as selections and projections. The reduction phase reduces the size of relations using semi joins and joins. The final processing phase sends all resulting relations to the final site which made result of the query [4]. In query processing, database users specify what data is required rather than specifying the procedure to retrieve the required data. Query processing is more complex and difficult in distributed environment rather than centralized environment. Several factors impact the performance of distributed query processing. These factors are selection of appropriate site, order of operation (like select, project and join) and selection of join method (like semi join, natural join, equi-join etc.). Due to the large number of factors involved, there could be multiple execution plans for a single query. Each plan is associated with a cost and the objective of a distributed query optimizer is to find a plan with lowest possible cost (optimal plan). The execution cost is expressed as a sum of I/O, CPU and communication cost. The goal of query processing is to execute queries effectively in order to minimize the response time and total communication cost. Query optimization is one of most limporttant and valuable stages in query execution[5]. Once query is entered and transform into algebraic form,