J. Parallel Distrib. Comput. 66 (2006) 1181 – 1188
www.elsevier.com/locate/jpdc
The Trellis security infrastructure for overlay metacomputers and bridged
distributed file systems
Paul Lu
∗
, Michael Closson, Cam Macdonell, Paul Nalos, Danny Ngo, Morgan Kan, Mark Lee
Department of Computing Science, University of Alberta, Edmonton, Alta., Canada T6G 2E8
Received 17 December 2005; received in revised form 31 March 2006; accepted 10 April 2006
Available online 27 June 2006
Abstract
Researchers often have non-privileged access to a variety of high-performance computer (HPC) systems in different administrative domains,
possibly across a wide-area network. Consequently, the security infrastructure becomes an important component of an overlay metacomputer:
a user-level aggregation of HPC systems.
The Trellis security infrastructure (TSI) is layered on top of the widely-deployed secure shell (SSH) and systems administrators only need
to provide unprivileged accounts to the users. The contribution of TSI is in demonstrating that a single sign-on (SSO) system, for a variety of
use-case scenarios, can be implemented without requiring a completely new security infrastructure. We describe the use of TSI for a Canada-
wide overlay metacomputer, for computational workloads (i.e., CISS-3) that spanned 22 administrative domains, at its peak had over 4000
concurrent jobs, and included a new distributed file system (i.e., Trellis NFS).
© 2006 Elsevier Inc. All rights reserved.
Keywords: Security; Single sign-on; Metacomputing; Computational science; Capacity computing; Global job scheduler; Distributed file system
1. Introduction
Some workloads and experiments in computational science
require large amounts of resources, both in terms of capability
and capacity. In capacity computing, where high throughput
is often the main goal, aggregating different high-performance
computing (HPC) systems is a common technique to provide
the needed capacity.
For example, Researcher A (Fig. 1) has access to his group’s
system, a departmental system, and a system at a HPC cen-
ter. Researcher B has access to her group’s server and (per-
haps) a couple of different HPC centers, including one center
in common with Researcher A. It would be ideal if all of the
systems could be part of one metacomputer. But, the differ-
ent systems may be controlled by different groups who may
not run the same security software or may not have negoti-
ated cross-domain security policies. Yet, Researchers A and B
would still like to be able to exploit the aggregate power of their
systems.
∗
Corresponding author.
E-mail address: paullu@cs.ualberta.ca (P. Lu).
0743-7315/$ - see front matter © 2006 Elsevier Inc. All rights reserved.
doi:10.1016/j.jpdc.2006.04.005
Some of the main requirements of the security infrastructure
for a cross-administrative domain situation include:
(1) Single sign-on (SSO) across multiple administrative do-
mains: The user wishes to authenticate (i.e., prove his
identity) to the system only once, and not once per-
domain. The well-known secure shell (SSH) [1] system
can support SSO if the user properly sets up his private
and public keys and uses the ssh-agent for automated
authentication.
(2) SSO support for background jobs, servers, and multiple
users: Jobs or servers left in the background need SSO
(e.g., to get a unit of work, return a result, move data).
(3) Security and mitigation of attacks: SSH is already consid-
ered to be reasonably secure. The challenge for this work
is in maintaining that security while not opening up new,
significant avenues of attack.
At a high level, the Trellis security infrastructure (TSI) ad-
dresses some of the main issues in security as follows:
(1) Basic authentication and authorization: TSI relies on the
existing ability to use ssh-agent for automatic, non-
interactive authentication. The problems are: How can all