IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 12, NO. 2, MARCH 2008 205 A Semantic Grid Infrastructure Enabling Integrated Access and Analysis of Multilevel Biomedical Data in Support of Postgenomic Clinical Trials on Cancer Manolis Tsiknakis, Member, IEEE, Mathias Brochhausen, Jarek Nabrzyski, Juliusz Pucacki, Stelios G. Sfakianakis, George Potamias, Cristine Desmedt, and Dimitris Kafetzopoulos Abstract—This paper reports on original results of the Advanc- ing Clinico-Genomic Trials on Cancer integrated project focusing on the design and development of a European biomedical grid in- frastructure in support of multicentric, postgenomic clinical trials (CTs) on cancer. Postgenomic CTs use multilevel clinical and ge- nomic data and advanced computational analysis and visualization tools to test hypothesis in trying to identify the molecular reasons for a disease and the stratification of patients in terms of treatment. This paper provides a presentation of the needs of users involved in postgenomic CTs, and presents such needs in the form of scenar- ios, which drive the requirements engineering phase of the project. Subsequently, the initial architecture specified by the project is presented, and its services are classified and discussed. A key set of such services are those used for wrapping heterogeneous clinical trial management systems and other public biological databases. Also, the main technological challenge, i.e. the design and develop- ment of semantically rich grid services is discussed. In achieving such an objective, extensive use of ontologies and metadata are re- quired. The Master Ontology on Cancer, developed by the project, is presented, and our approach to develop the required metadata registries, which provide semantically rich information about avail- able data and computational services, is provided. Finally, a short discussion of the work lying ahead is included. Index Terms—Biomedical grid, cancer, metadata, ontology, postgenomic clinical trials, semantic integration of heterogeneous biomedical databases. Manuscript received November 3, 2006; revised April 17, 2007. This work was supported in part by the European Union cofunded Reis- tance Temperature Dectectors (RTD) Advancing Clinico-Genomic Trials on Cancer: Open Grid Services for Improving Medical Knowledge Discovery (ACGT) Project under Grant FP6–2005-IST-026996. The work of C. Desmedt was supported by the Fonds National de la Recherche Scientifique. M. Tsiknakis is with the Foundation for Research and Technology– Hellas, Institute of Computer Science, GR-71110 Heraklion, Greece (e-mail: tsiknaki@ics.forth.gr). M. Brochhausen is with the Institute of Formal Ontologies and Medical Information Science (IFOMIS), University of Saarland, 66041 Saarbr¨ ucken, Germany (e-mail: mathias.brochhausen@ifomis.uni-saarland.de). J. Nabrzyski and J. Pucaski are with the Poznan Supercomputing and Networking Center, 60-967 Poznan, Poland (e-mail: naber@man.poznan.pl; pucacki@man.poznan.pl). S. G. Sfakianakis, G. Potamias, and D. Kafetzopoulos are with the Founda- tion for Research and Technology–Hellas, Institute of Computer Science and Institute of Molecular Biology and Biotechnology, Heraklion, Greece (e-mail: ssfak@ics.forth.gr; potamias@ics.forth.gr; kafetzo@imbb.forth.gr). C. Desmedt is with the Functional Genomics and Translational Re- search Unit, Jules Bordet Instutute, Brussels 1000, Belgium (e-mail: christine.desmedt@bordet.be). Digital Object Identifier 10.1109/TITB.2007.903519 I. INTRODUCTION R ECENT advances in research methods and technologies have resulted in an explosion of information and knowl- edge about cancers and their treatment. Exciting new research on the molecular mechanisms that control cell growth and dif- ferentiation has resulted in a quantum leap in our understanding of the fundamental nature of cancer cells, and has suggested valuable new approaches to cancer diagnosis and treatment. The ability to characterize and understand cancer is grow- ing exponentially based on information from genetic and pro- tein studies, clinical trials, and other research endeavors. The breadth and depth of the information already available in the research community at large present an enormous opportunity for improving our ability to reduce mortality from cancer, im- prove therapies, and meet the demanding individualization-of- care needs [1], [2]. While these opportunities exist, the lack of a common infras- tructure has prevented clinical research institutions from being able to mine and analyze disparate data sources. As a result, very few cross-site studies and multicentric clinical trials are performed, and in most cases, it is not possible to seamlessly integrate multilevel data (from the molecular to the organ and individual levels). Moreover, clinical researchers or molecular biologists often find it hard to exploit each other’s expertise due to the absence of a cooperative environment, which enables the sharing of data, resources, or tools for comparing results and experiments, and a uniform platform supporting the seamless integration and analysis of disease-related data at all levels [2]. This inability to share technologies and data developed by differ- ent organizations is, therefore, severely hampering the research process. The vision of the Advancing Clinico-Genomic Trials on Cancer (ACGT) Project (www.eu-acgt.org) [3] is to contribute to the resolution of these problems by developing a semantically rich grid infrastructure in support of multicentric, postgenomic clinical trials (CTs), thus enabling discoveries in the labora- tory to be quickly transferred to the clinical management and treatment of patients (see Fig. 1). This paper presents a short background section discussing the urgent needs faced by the biomedical informatics research community; it presents the clinical trials upon which the ACGT project is based, for both gathering and eliciting requirements and also for validating the technological infrastructure designed. It continues with a presentation of the initial ACGT architecture 1089-7771/$25.00 © 2008 IEEE