Designing a distributed database on a local area network: a methodology and decision support system H. Lee a, * , Y.-K. Park b , G. Jang c , S.-Y. Huh d a Corporate Information System Laboratory, Graduate School of Management, KAIST, 207-43, Cheongyangri-Dong, Dongdaemoon-Gu, Seoul 130-012, South Korea b LG LCD Ltd. LG Young-Dong Bldg. 891, Daechi-dong, Kangnam-gu, Seoul 135-280, South Korea c Oracle Korea Ltd. Jeong Woo Bldg 3F. 1319-15, Dal-dong, Nam-Gu, Ulsan 680-030, South Korea d Database Application Laboratory, Graduate School of Management, KAIST, 207-43, Cheongyangri-Dong, Dongdaemoon-Gu, Seoul 130-012, South Korea Received 23 March 1998; received in revised form 4 June 1999; accepted 23 June 1999 Abstract Local area networks (LANs) are important for an enterprise to hold a competitive edge. Many companies have therefore converted terminal-based computing systems to LAN-based distributed data processing systems. This paper proposes a design methodology for distributed databases connected by a LAN. Two primary objectives of the methodology are: (i) to allocate data files and workload among heterogeneous servers; and (ii) to determine the number of servers to satisfy the response time required for processing each transaction. The file and workload allocation decision is formulated as a nonlinear zero–one integer programming problem. This problem is proven to be NP-complete. A heuristic is developed to solve this problem effectively. A decision support system is implemented and an example is solved to illustrate the practical usefulness of the system. 2000 Elsevier Science B.V. All rights reserved. Keywords: Distributed database; Decision support system; Local area network; File and workload allocation; Nonlinear integer programming; Local proces- sing overhead; Server service capacity 1. Introduction A local area network (LAN) is an infrastructure which places the company on a secure footing in today’s highly competitive business environment. A LAN is a communica- tion network that is confined to a small area, such as a single building or a small cluster of buildings [14,15]. Nowadays most organizations have established LANs and used them in various areas of work. Indeed, LANs play an important role in information sharing and office works in business, acade- mia, and industry. Attached to the LANs would be personal computers, database servers providing various data, and communication servers for interacting with remote systems and downloading data to the local databases [1]. Fig. 1 shows the structure of distributed system on a LAN. An intermediate system (IS) is a device used to connect two subnetworks and permit communication between end systems attached to different subnetworks [14]. Transac- tions are generated from terminals and routed to DB server having required data by communication server. In a LAN establishment, the distributed database design and the process capacity design are very important issues. A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network [12]. In a distributed database design, two fundamental allo- cation design issues are: (i) file allocation, the optimum distribution of fragments; and (ii) workload allocation, the optimum distribution of transactions to be processed in a LAN [11]. Fragments, data files, are database partitions generated from fragmentation design. Allocation strategies affect: (i) the total operating cost which consists of commu- nication overhead, storage cost and local processing cost; and (ii) the response time, which consists of network delay, communication delay, local delay (I/O time, CPU proces- sing time) and local processing delay. Allocation of data files and workload across the nodes of a computer network has been studied extensively. This problem has been proven to be NP-complete [8,9,11]. Wah and Lien [18] studied the file allocation problem (FAP) in a distributed databases design on a LAN. They considered only communication overhead in a homoge- neous local broadcast network. In their model, redundant data file allocation policy was adopted, but join problem was not considered. Their model was formulated in the Information and Software Technology 42 (2000) 171–184 0950-5849/00/$ - see front matter 2000 Elsevier Science B.V. All rights reserved. PII: S0950-5849(99)00056-7 * Corresponding author. Tel.: + 82-2958-32146; fax: + 82-2958-3220. www.elsevier.nl/locate/infsof