IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 12, NO. 2, MARCH 2008 205
A Semantic Grid Infrastructure Enabling Integrated
Access and Analysis of Multilevel Biomedical Data
in Support of Postgenomic Clinical Trials on Cancer
Manolis Tsiknakis, Member, IEEE, Mathias Brochhausen, Jarek Nabrzyski, Juliusz Pucacki, Stelios G. Sfakianakis,
George Potamias, Cristine Desmedt, and Dimitris Kafetzopoulos
Abstract—This paper reports on original results of the Advanc-
ing Clinico-Genomic Trials on Cancer integrated project focusing
on the design and development of a European biomedical grid in-
frastructure in support of multicentric, postgenomic clinical trials
(CTs) on cancer. Postgenomic CTs use multilevel clinical and ge-
nomic data and advanced computational analysis and visualization
tools to test hypothesis in trying to identify the molecular reasons
for a disease and the stratification of patients in terms of treatment.
This paper provides a presentation of the needs of users involved
in postgenomic CTs, and presents such needs in the form of scenar-
ios, which drive the requirements engineering phase of the project.
Subsequently, the initial architecture specified by the project is
presented, and its services are classified and discussed. A key set of
such services are those used for wrapping heterogeneous clinical
trial management systems and other public biological databases.
Also, the main technological challenge, i.e. the design and develop-
ment of semantically rich grid services is discussed. In achieving
such an objective, extensive use of ontologies and metadata are re-
quired. The Master Ontology on Cancer, developed by the project,
is presented, and our approach to develop the required metadata
registries, which provide semantically rich information about avail-
able data and computational services, is provided. Finally, a short
discussion of the work lying ahead is included.
Index Terms—Biomedical grid, cancer, metadata, ontology,
postgenomic clinical trials, semantic integration of heterogeneous
biomedical databases.
Manuscript received November 3, 2006; revised April 17, 2007. This
work was supported in part by the European Union cofunded Reis-
tance Temperature Dectectors (RTD) Advancing Clinico-Genomic Trials on
Cancer: Open Grid Services for Improving Medical Knowledge Discovery
(ACGT) Project under Grant FP6–2005-IST-026996. The work of C. Desmedt
was supported by the Fonds National de la Recherche Scientifique.
M. Tsiknakis is with the Foundation for Research and Technology–
Hellas, Institute of Computer Science, GR-71110 Heraklion, Greece (e-mail:
tsiknaki@ics.forth.gr).
M. Brochhausen is with the Institute of Formal Ontologies and Medical
Information Science (IFOMIS), University of Saarland, 66041 Saarbr¨ ucken,
Germany (e-mail: mathias.brochhausen@ifomis.uni-saarland.de).
J. Nabrzyski and J. Pucaski are with the Poznan Supercomputing and
Networking Center, 60-967 Poznan, Poland (e-mail: naber@man.poznan.pl;
pucacki@man.poznan.pl).
S. G. Sfakianakis, G. Potamias, and D. Kafetzopoulos are with the Founda-
tion for Research and Technology–Hellas, Institute of Computer Science and
Institute of Molecular Biology and Biotechnology, Heraklion, Greece (e-mail:
ssfak@ics.forth.gr; potamias@ics.forth.gr; kafetzo@imbb.forth.gr).
C. Desmedt is with the Functional Genomics and Translational Re-
search Unit, Jules Bordet Instutute, Brussels 1000, Belgium (e-mail:
christine.desmedt@bordet.be).
Digital Object Identifier 10.1109/TITB.2007.903519
I. INTRODUCTION
R
ECENT advances in research methods and technologies
have resulted in an explosion of information and knowl-
edge about cancers and their treatment. Exciting new research
on the molecular mechanisms that control cell growth and dif-
ferentiation has resulted in a quantum leap in our understanding
of the fundamental nature of cancer cells, and has suggested
valuable new approaches to cancer diagnosis and treatment.
The ability to characterize and understand cancer is grow-
ing exponentially based on information from genetic and pro-
tein studies, clinical trials, and other research endeavors. The
breadth and depth of the information already available in the
research community at large present an enormous opportunity
for improving our ability to reduce mortality from cancer, im-
prove therapies, and meet the demanding individualization-of-
care needs [1], [2].
While these opportunities exist, the lack of a common infras-
tructure has prevented clinical research institutions from being
able to mine and analyze disparate data sources. As a result,
very few cross-site studies and multicentric clinical trials are
performed, and in most cases, it is not possible to seamlessly
integrate multilevel data (from the molecular to the organ and
individual levels). Moreover, clinical researchers or molecular
biologists often find it hard to exploit each other’s expertise due
to the absence of a cooperative environment, which enables the
sharing of data, resources, or tools for comparing results and
experiments, and a uniform platform supporting the seamless
integration and analysis of disease-related data at all levels [2].
This inability to share technologies and data developed by differ-
ent organizations is, therefore, severely hampering the research
process.
The vision of the Advancing Clinico-Genomic Trials on
Cancer (ACGT) Project (www.eu-acgt.org) [3] is to contribute
to the resolution of these problems by developing a semantically
rich grid infrastructure in support of multicentric, postgenomic
clinical trials (CTs), thus enabling discoveries in the labora-
tory to be quickly transferred to the clinical management and
treatment of patients (see Fig. 1).
This paper presents a short background section discussing
the urgent needs faced by the biomedical informatics research
community; it presents the clinical trials upon which the ACGT
project is based, for both gathering and eliciting requirements
and also for validating the technological infrastructure designed.
It continues with a presentation of the initial ACGT architecture
1089-7771/$25.00 © 2008 IEEE