PONTE: A Context-Aware Approach for Automated Clinical Trial Protocol Design George Tsatsaronis 1 , Konstantinos Mourtzoukos 2 , Vassiliki Andronikou 2 , Tassos Tagaris 2 , Iraklis Varlamis 3 , Michael Schroeder 1 , Theodora Varvarigou 2 , Dimitris Koutsouris 2 , Nikolaos Matskanis 4 1 Biotechnology Center, TU Dresden 01307, Dresden Germany {george.tsatsaronis,ms} @biotec.tu-dresden.de 2 National Technical University of Athens 15780, Athens Greece {kmour,vandro}@mail.ntua.gr tassos@biomed.ntua.gr dora@telecom.ntua.gr dkoutsou@biomed.ntua.gr 3 Harokopio University of Athens 17671, Athens Greece varlamis@hua.gr 4 Centre of Excellence in Information and Communication Technologies CETIC B-6041 Charleroi Belgium nikolaos.matskanis@cetic.be ABSTRACT The rapidly increasing volume of published clinical and non- clinical data at a variety of sources and the resulting great effort required for researchers to access them and mine information of interest lead to clinical trials that are based on only a limited set of knowledge in the domain they cover. This restricted view of the clinical trials’ context is quite often the reason behind unsuccessful trials and/or successful ones which, however, underestimate drugs’ unwanted effects and thus their results are of low external validity in the much more complicated environment of clinical healthcare. In this paper, we present a context-aware approach, which has been developed in the PONTE project, for effectively guiding medical researchers during clinical trial protocol design and allowing for more efficient and effective access to scientific literature. The suggested approach incorporates intelligent services and advanced text mining mechanisms for scientific literature querying and mining during protocol design, taking into account the study context (i.e. active substance, target and disease) and the domain context in literature. 1. INTRODUCTION Among the key aims of clinical research is the investigation of the therapeutic potential of substances, methodologies and devices and their transformation and development into real-world therapies. Clinical trials comprise a critical step in this process, focusing on the investigation of the efficacy and safety of these candidate treatments initially on animal models and eventually on humans. Given the great impact such research has on the world population and the high investment it requires - with recently reported figures indicating that the average cost may even reach 9 billion [11] per drug approved - much debate has been held over the years related to the effectiveness and the efficiency of the processes followed in clinical research. In the meanwhile, the therapy development timeline forces patients seeking for therapies to be on the wait for about 11 years after the initial discovery of the potential therapy. And still, figures in drug development demonstrate that of every 5000 molecules which are pre-clinically tested, only 1 will in the end be approved and will enter the market [15]. A great challenge which researchers face within this strongly competitive and complicated environment is access to the continuously rising volume of data and information in life sciences. The informational sources, which are highly important for their research design and implementation are numerous, sharing various formats, structures and levels of granularity. As a result the task of accessing and mining information of interest from them requires, quite often, unmanageable effort and excessive time. The latter results in important pieces of scientific information being hidden and thus not taken into consideration with minor and/or major consequences in the research conducted and its potential for real-world application. This paper presents the architecture and functionality of the PONTE project platform 1 . PONTE is a knowledge-oriented platform, which provides a set of mechanisms and services facilitating the specification of the “test of hypothesis” on a scientifically-valid basis and the intelligent design of clinical trials. Towards this direction, the platform incorporates a set of advanced data mining and semantic reasoning mechanisms which are applied on a variety of web data sources containing clinical and non-clinical information. The two main components of the PONTE system, which incorporate these mechanisms are the Decision Support (DS) component, and the GoPONTE semantic search engine 2 . Their detailed description is provided in the sections that follow. More specifically, in section 2, the overall architecture as well as the design of the individual components of the proposed approach is presented. Section 3 demonstrates the data flow in the system through a real-world scenario and section 4 presents the related work in the field. 2. A Context-Aware Architecture for Clinical Trial Protocol Design 2.1 Overall Architecture Description In our approach, clinical trial design context has two different, yet related, aspects: study-driven: the specific trial’s main parameters (i.e., study disorder, investigational active substance, target), which are fed by the researcher, domain-driven: the research environment within which the study is held, i.e. the scientific (non-clinical and 1 http://www.ponte-project.eu/ 2 Publicly available at: http://www.gopubmed.org/web/goponte/