A WOS TM -Based Solution for High Performance Computing Nabil Abdennadher University of Applied Sciences Geneva, Switzerland abdennad@eig.unige.ch Gilbert Babin Information Technologies HEC — Montréal Montréal, Canada H3T 2A7 Gilbert.Babin@hec.ca Peter Kropf Dép. informatique et rech. op. Université de Montréal Montréal, Canada H3C 3J7 kropf@iro.umontreal.ca Abstract Most of the development environments for High Perfor- mance Parallel applications require that all the computing modules and resources be known in advance. The execu- tion environment must know where the different program modules will be executed, and must properly configure each computer involved in the execution. In this paper, we de- scribe how the Web Operating System (WOS TM ) environ- ment may be used to dynamically adjust the granularity of parallel programs, locate available computers to perform the computations and how these computers are dynamically configured. The WOS [7] is a metacomputing environment suitable for supporting and managing distributed/parallel processing on wide and local networks. Communication be- tween WOS nodes is realized through a generic service pro- tocol (WOSP) and a discovery/location protocol (WOSRP). WOSP may be versioned to support specialized services. In this paper, we focus on the design of two such versions for Parallel/Distributed applications and High Performance computing. These versions support the location and setup of computational nodes for these applications. 1. Introduction Advances in networking technology and computational infrastructure changed the High Performance Computing (HPC) landscape. Tightly coupled, dedicated processors tend to be replaced by loosely coupled independent ma- chines connected via standard local or wide area networks. Centralized High Performance (HP) applications developed with proprietary, closed-source, hardware-dependant envi- ronments are more and more replaced by distributed “com- ponents” sharing and managing resources spread over a net- worked environment. The new HP distributed platforms are accessed from the user’s desktop in a uniform and user-friendly manner, such as provided by the Web’s in- terfaces. The network environment combines multiple ad- ministration domains, heterogeneous computing platforms and security policies. Sharing and managing the resources spread over this network therefore becomes a cumbersome task. This problem is called the wide-area computing prob- lem [10]. The wide-area computing problem can be solved in an ad hoc manner for each application: scripts and various net- work tools can serve for this purpose. However, these solu- tions are very limited, lack scalability, and require a specific knowledge of the architecture of the machines. A more systematic way to solve this problem is to build a Network Operating System (NOS) for the management of distributed execution environments. This NOS would pro- vide high level means for sharing and managing complex resources distributed over the network. We think that meta- computing is one promising approach to reach that goal. The purpose of metacomputing is to give the illusion of a single machine by transparently managing data move- ment, scheduling of application components on available resources, fault detection, and protection of user’s data and physical resources. However, requirements for HPC go far beyond transpar- ent management and use of resources distributed over the network. In the context of HPC, the metacomputing envi- ronment must meet the performance requirements of the ap- plication from a computational and communication stand- point. To achieve this goal, several metacomputing environ- ments support HPC by providing their own, closed-source, HP execution tools. We argue that, although this approach favours transparency, it does so at the expense of portabil- ity and efficiency. It binds the user to the specific HP exe- cution tools supported by the metacomputing environment selected. The approach proposed in this article follows two pri- mary objectives: 1. Satisfy the HP constraints of a given application by using parallel and distributed computation and, 2. Specify very little about implementation. This goal should be realized by using metacomputing tools dur- ing the configuration of the HP application (searching, reserving and assigning resources to the application).