ProActive: Using a Java Middleware for HPC Design, Implementation and Benchmarks Brian Amedro, Denis Caromel, Fabrice Huet INRIA Sophia-Antipolis, CNRS, I3S, UNSA. 2004, Route des Lucioles, BP 93 06902 Sophia-Antipolis Cedex, France. First.Last@inria.fr Vladimir Bodnartchouk, Christian Delb´ e ActiveEon 2004, Route des Lucioles, BP 93 06902 Sophia-Antipolis Cedex, France First.Last@activeeon.com Guillermo L. Taboada University of A Coru˜ na Faculty of Informatics, Spain taboada@udc.es Abstract—Although Java is among the most used programming languages, its use for HPC applications is still marginal. This article reports on the design, implementation and benchmarking of a Java version of the NAS Parallel Benchmarks translated from their original Fortran / MPI implementation. We have based our version on ProActive, an open source middleware designed for parallel and distributed computing. This paper gives a description of the ProActive middleware principles, and how we have implemented the NAS Parallel Benchmark on such Java library. We Also gives some basic rules to write HPC code in Java. Finally, we have compared the overall performance between the legacy and the Java ProActive version. We show that the performance varies with the type of computation but also with the Java Virtual Machine, no single one providing the best performance in all experiments. We also show that the performance of the Java version is close to the Fortran one on computational intensive benchmarks. However, on some communications intensive benchmarks, the Java version exhibits scalability issues, even when using a high performance socket implementation (JFS). I. I NTRODUCTION Message Passing Interface (MPI) is the dominant pro- gramming model of choice for scientific computing. This library proposes many low-level primitives designed for pure performance. But for several years, the tendency has been to look for productivity[13], and to propose efficient high- level primitives like collective operations [9], object-oriented distributed computing [6] and material to ease the deployment of applications. In order to perform an evaluation of Java capabilities for high performance computing, we have implemented the NAS 1 Parallel Benchmarks (NPB) which are a standard in distributed scientific computation. Many middleware comparatives and optimization techniques are usually based on them [17], [7], [10], [12], [8]. They have the characteristic to test a large set of aspects of a system, from pure computation performance to communication speed. By using a Java-based middleware, instead of Fortran+MPI, we want to demonstrate the performance which can be ob- tained, comparing it to an equivalent native version. Our aim is to identify the areas where Java still lacks some performance, in particular the network layer. 1 Numerical Aerodynamic Simulation Our contributions are the following : An evaluation of the Java overhead for arithmetic com- putation and array manipulation A report on common performance pittfals and how to avoid them A performance comparison of an implementation of the NPBs in Java and Fortran/MPI (PGI ) on Gigabit Ethernet and SCI The rest of this paper is organized as follows. Section 2 gives some background: a short description about the benchmarks used in our experiments, the ProActive library (in particular the active object model), and the Java Fast Sockets[18]. Section 3 presents some related work. In section 4, we discuss the implementation and some performance issues. Section 5 presents the results obtained with the NAS Parallel Benchmarks on two network architectures. Finally, we discuss the future work and conclude in section 5. II. BACKGROUND A. The NAS Parallel Benchmarks NAS Parallel Benchmarks (NPB) consists of a set of kernels which are derived from computational fluid dynamics (CFD) applications. They were designed by the NASA Ames Re- search Center and test different aspects of a system. Some are testing pure computation performance with differ- ent kinds of problems like matrix computation or FFTs. Others involve a high memory usage or network speed with large data size communications. Finally, some problems try to evaluate the impact of irregular latencies between processors (short or long distance communications). Each of these five kernels was designed to test a particular subset of these aspects. To follow the evolution of computer performance, the NPB were designed with several classes of problems making kernels harder to compute by modifying the size of data and/or the number of iterations. There are now 6 classes of problems: S, W, A, B, C and D. Class S is the easiest problem and is for testing purpose only. Class D is the hardest and usually requires a lot of memory. Here we will use the IS, FT, EP, CG and MG kernels with the problem class C. INTERNATIONAL JOURNAL OF COMPUTERS AND COMMUNICATIONS Issue 3, Volume 3, 2009 49