Run-time Load Balancing System on SAN-connected PC Cluster for Dynamic Injection of CPU and Disk Resource — A Case Study of Data Mining Application — GODA Kazuo 1 , TAMURA Takayuki 2 , OGUCHI Masato 3 , and KITSUREGAWA Masaru 1 1 Institute of Industrial Science, The University of Tokyo 2 Mitsubishi Electric Corporation 3 Research and Development Initiative, Chuo University Abstract. PC cluster system is an attractive platform for data-intensive applications. But the conventional shared-nothing system has a limit on load balancing performance and it is difficult to change the number of nodes and disks dynamically during execution. In this paper, we develop dynamic resource injection, where the system can inject CPU power and expand I/O bandwidth by adding nodes and disks dynamically in the SAN(Storage Area Network)-connected PC cluster. Our experiments with data mining application confirm its effectiveness. We show the advantages of combining PC cluster with SAN. 1 Introduction PC cluster system is regarded as one of the most promising platforms for the data- intensive applications such as data mining, data warehouse, etc. because of its cost performance. However, the conventional shared-nothing system, where each node manages its own disk exclusively, has a limit on load balancing performance. In particular it cannot handle the large skew which often occurs e.g. when the system changes the number of nodes. User has to configure the number of nodes, and data placements before the execution. The available CPU power and I/O bandwidth are statically bounded by that initial configuration, thus limit the user convenience. We propose the SAN(Storage Area Network)-connected PC cluster to handle such problems. In this configuration, the whole storage space are virtually shared by all the nodes, so we can achieve much higher load balancing performance. It is also possible to allocate resources, namely CPU power and I/O bandwidth, dynamically during execution. We can inject resource only when it is necessary. It can also free the user from configuration tasks before the execution. In this paper, we pick up association rule mining as an example of data-intensive applications and we design, implement and evaluate dynamic resource injection. In the next section, association rule mining is explained. As the underlying mecha- nism to access shared disks on SAN-connected PC cluster, we implement Storage Virtualizer. The detail is given in Section 3 and its load balancing performance is evaluated in Section 4. Dynamic resource injection is explained and evaluated in Section 5 and Section 6 concludes the paper.