Asynchronous Communication in
Java over Infiniband and DECK
*
Rodrigo da Rosa Righi, Philippe O. A. Navaux
Universidade Federal do Rio Grande do Sul
{rrrighi, navaux}@inf.ufrgs.br
M´ arcia Cristina Cera, Marcelo Pasin
Universidade Federal de Santa Maria
{cera, pasin}@inf.ufsm.br
Abstract
Java is becoming an attractive and easy to use program-
ming language. It provides two systems for distributed com-
puting, RMI and sockets, which describe a synchronous
communication over TCP/IP. These Java core features may
not be the best choice for cluster computing environments,
since they do not provide high performance, a critical fac-
tor in this scenario. In this paper it is presented the develop-
ment of Aldeia System, a library proposal to provide asyn-
chronous communication in Java for cluster programming.
Aldeia has been deployed on SAN hardware through Infini-
band and DECK high-speed substrates. This paper shows
the rationale for Aldeia’s creation, its structure and encour-
aging results in synchronous and asynchronous approaches.
1. Introduction
Clusters have been widely used as an architecture to
achieve high performance computing[1]. They are a con-
trolled and dedicated environment with low error rate,
where nodes exchange messages across the network in or-
der to solve parallel applications. Thus, a cluster can be
assembled as a system area network (SAN)[9, 17] that fo-
cuses on low latency, low CPU overhead and high band-
width communication. To achieve these features, SAN soft-
ware can use asynchrony concept in message passing[4].
Additionally, SAN hardware can reach a gigabit band-
width and present low latency in microseconds order.
Beside network infrastructure, another issue in cluster
computing is the programming language adopted to develop
parallel applications. Concerning this, we can observe the
growing of Java usage in clusters[10, 14]. Java provides bet-
ter and easier mechanisms to write multi-threading applica-
tions and also offers a remote method invocation (RMI) and
sockets systems for distributed computing. RMI and sock-
ets perform synchronous communication between Java pro-
cesses over TCP/IP protocol. Commonly, cluster applica-
* Partially supported by CAPES and CNPq (Brazil)
tions use standard TCP/IP protocol for inter-nodes commu-
nication, even though TCP/IP imposes software penalties,
such as copies and checksums computation. These features
can turn TCP/IP out for high performance environments[8].
In this context Aldeia System[6] was developed aiming
to overlap embedded Java distributed systems. Aldeia ex-
ploits some SAN cluster features, such as high-speed hard-
ware and optimized communication techniques. Its facili-
ties are presented through an intuitive Java programming
interface, providing easier manners to write SAN applica-
tions. Aldeia’s main purpose is to take profit from asyn-
chronous communication. This idea explores concurrency
between computation and communication, increasing ap-
plication flexibility and efficiency. Besides this, asynchrony
concept is a powerful mechanism to hide network latency,
especially when used over TCP/IP. Aldeia applications can
execute on a SAN cluster and over TCP/IP networks. Nowa-
days, Aldeia has been used over Infiniband[16] and over
the platforms supported by DECK[2] environment.
This paper presents the Aldeia system and is organized
as follows. Section 2 describes cluster communication evo-
lution. Section 3 shows some Java communication libraries
for cluster programming. Section 4 presents the Aldeia, ex-
plaining its creating rationale, its structure and implemen-
tation. In section 5 Aldeia’s evaluation is shown. Section 6
presents a conclusion with Aldeia’s main contributions.
2. Cluster Networking Evolution
Generally, the standard cluster communication system is
composed by Ethernet (802.*) hardware and uses TCP/IP
as network protocol. Despite its wide usage and attractive
costs, this combination imposes high communication la-
tency. TCP/IP is a software protocol that spends CPU cycles
for its computations and was developed for wide and local
area networking bringing up reliable communication[8]. It
imposes headline checksums, message copying, flow con-
trol and interruption handling to implement its communi-
cation facilities. Moreover, there are overheads related with
user-kernel levels context switching for each network inter-
Proceedings of the 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’05)
1550-6533/05 $20.00 © 2005 IEEE