A Resilient Telco Grid Middleware
C. Lac and S. Ramanathan
France Telecom RD/MAPS/AMS/VVT
2, avenue Pierre Marzin, 22307 Lannion Cedex, France
{chidung.lac, sakkaravarthi.ramanathan}@francetelecom.com
Abstract
Grid computing can exploit distributed,
underutilized or not, resources to provide massive
parallel CPU capacity. Load balancing, applications
sharing, as well as geographically dispersed
databases features are other Grid's aspects which
are of interest for a telecommunications operator
(Telco). Building a Grid middleware in order to
implement Telco's services is thus a way to assess the
validity of this type of architecture for future
applications. To achieve a trustworthy platform, the
middleware needs to take into account accidental or
malicious faults which can impact different resilience
aspects. This paper describes a secure and highly
available architecture which, besides traditional
Grid middleware functionalities (resource broker,
job mapping, system monitoring, ...), makes use of
fault-tolerant mechanisms (process duplication,
failure handling, ...) to guarantee QoS defined in the
service level agreement. Security is carried out by
analyzing each node's defense capability issue and
finding a suitable solution to match this with the
appropriate user's job.
1. Grid usage for Telcos
Grid computing is distributed computing taken to
the next evolutionary level. The goal is to create the
illusion of a simple, yet large and powerful, self
managing virtual computer out of a large collection
of connected heterogeneous systems sharing various
resources (CPU, storage capacity, applications, etc.).
Whereas Telcos have a great deal of experience in
managing large complex networks, they should
extend this skill set into the Grid, by proposing to
take control of the nodal IT assets, and provide an
end-to-end Grid service to customers such as
residential clients, small and medium enterprises, and
corporate companies. Communication networks and
Grid computing, if merged, have a great deal of
technological potential. This would allow, for
instance, users of mobile devices (cell phones,
laptops, PDAs, etc.) to submit jobs on the Grid, and
get access to its tremendous processing power at their
fingertips. It would also allow them to access data
being stored or generated by the Grid, and analyze it
on their handheld devices.
While classical clustering and distributed
computing techniques have been mostly neglected as
recognized to be likely out of the main core business
of the Telcos, the ambitious goal of Grid, related to
spreading and managing huge amounts of data across
distributed (and distant) sites, is being seriously
considered by network providers. Grid computing
heavily involving the networks offers interesting
opportunities to Telcos: the exploitation of Grid for
internal use can greatly improve operations and lower
the expenses, while offering external services
through these networks could be a profitable new
market. The driving factor for a Telco Grid network
service offering will be to effectively use the assets it
already owns in order to realize a fast return on
investment.
2. Dependability and fault tolerance in
Grid
The preliminary implementation of a "proxy",
which can combine groups of identical Grid services
in various configurations (such as fallbacks and
parallel execution), giving the appearance of a single
and better service, aims to build on this to provide,
according to user specifications, dependability
(reliability, availability, …) for arbitrary applications
[1].
Adding dependability features in service-based
Grid emphasizes service composition rather than
sharing of low quality resources. The idea is to build
applications out of computational services provided
by the different sites of the Grid [2].
Developing both an improved fault model for
Grid computing and a method for offering fault-
tolerant Grid applications that will provide protection
and robustness against both malicious and erroneous
faults is a big task [3]. A fault model attempts to
Proceedings of the 11th IEEE Symposium on Computers and Communications (ISCC'06)
0-7695-2588-1/06 $20.00 © 2006 IEEE